Benchmarks
Lawhigher is better
LegalBench
Legal-reasoning task suite (originated by Stanford CodeX), run independently by Vals AI and reported as overall accuracy across tasks.
Benchmark source- Domain
- Law
- Metric
- %
- Orientation
- Higher is better
- Results
- 24
Ranking
| # | Model | Score | Source | Status |
|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Google DeepMind | 87.4% | Vals AI — LegalBench3rd-party | unverified |
| 2 | Gemini 3 Pro Google DeepMind | 87% | Vals AI — LegalBench3rd-party | unverified |
| 3 | Gemini 3 Flash Google DeepMind | 86.9% | Vals AI — LegalBench3rd-party | unverified |
| 4 | GPT-5.5 OpenAI | 86.5% | Vals AI — LegalBench3rd-party | unverified |
| 5 | GPT-5.4 OpenAI | 86% | Vals AI — LegalBench3rd-party | unverified |
| 6 | Claude Opus 4.7 Anthropic | 85.3% | Vals AI — LegalBench3rd-party | unverified |
| 7 | Claude Opus 4.6 Anthropic | 85.3% | Vals AI — LegalBench3rd-party | unverified |
| 8 | Qwen3.7 Max Alibaba Qwen | 84.9% | Vals AI — LegalBench3rd-party | unverified |
| 9 | Kimi K2.6 Moonshot AI | 84.7% | Vals AI — LegalBench3rd-party | unverified |
| 10 | Grok 4.3 xAI | 84.5% | Vals AI — LegalBench3rd-party | unverified |
| 11 | GLM-5.1 Z.ai (Zhipu) | 84.4% | Vals AI — LegalBench3rd-party | unverified |
| 12 | Muse Spark Meta AI | 84.2% | Vals AI — LegalBench3rd-party | unverified |
| 13 | Claude Opus 4.8 Anthropic | 83.6% | Vals AI — LegalBench3rd-party | unverified |
| 14 | Gemini 3.5 Flash Google DeepMind | 83.6% | Vals AI — LegalBench3rd-party | unverified |
| 15 | GPT-5.2 OpenAI | 82.8% | Vals AI — LegalBench3rd-party | unverified |
| 16 | Claude Sonnet 4.6 Anthropic | 82.1% | Vals AI — LegalBench3rd-party | unverified |
| 17 | Claude Haiku 4.5 Anthropic | 81.2% | Vals AI — LegalBench3rd-party | unverified |
| 18 | DeepSeek V4-Pro DeepSeek | 80.3% | Vals AI — LegalBench3rd-party | unverified |
| 19 | Kimi K2 Thinking Moonshot AI | 80.2% | Vals AI — LegalBench3rd-party | unverified |
| 20 | Mistral Large 3 Mistral AI | 79.1% | Vals AI — LegalBench3rd-party | unverified |
| 21 | Llama 4 Maverick Meta AI | 77.8% | Vals AI — LegalBench3rd-party | unverified |
| 22 | Grok 4.20 xAI | 77.7% | Vals AI — LegalBench3rd-party | unverified |
| 23 | DeepSeek V3.2 DeepSeek | 76.1% | Vals AI — LegalBench3rd-party | unverified |
| 24 | Llama 4 Scout Meta AI | 72% | Vals AI — LegalBench3rd-party | unverified |
