Benchmarks
Tax & Accountinghigher is better
TaxEval
Vals AI TaxEval v2 — 1,500+ expert-written tax questions, scored on overall accuracy. Independent, in-house-run.
Benchmark source- Domain
- Tax & Accounting
- Metric
- %
- Orientation
- Higher is better
- Results
- 25
Ranking
| # | Model | Score | Source | Status |
|---|---|---|---|---|
| 1 | Muse Spark Meta AI | 77.7% | Vals AI — TaxEval v23rd-party | unverified |
| 2 | Claude Sonnet 4.6 Anthropic | 77.1% | Vals AI — TaxEval v23rd-party | unverified |
| 3 | Claude Opus 4.6 Anthropic | 76% | Vals AI — TaxEval v23rd-party | unverified |
| 4 | GPT-5.2 OpenAI | 75.8% | Vals AI — TaxEval v23rd-party | unverified |
| 5 | Claude Opus 4.8 Anthropic | 75.6% | Vals AI — TaxEval v23rd-party | unverified |
| 6 | Claude Opus 4.7 Anthropic | 75.3% | Vals AI — TaxEval v23rd-party | unverified |
| 7 | Qwen3.7 Max Alibaba Qwen | 75.3% | Vals AI — TaxEval v23rd-party | unverified |
| 8 | GPT-5.5 OpenAI | 75% | Vals AI — TaxEval v23rd-party | unverified |
| 9 | Kimi K2.6 Moonshot AI | 74.7% | Vals AI — TaxEval v23rd-party | unverified |
| 10 | Gemini 3.5 Flash Google DeepMind | 74.4% | Vals AI — TaxEval v23rd-party | unverified |
| 11 | Grok 4.20 xAI | 74.1% | Vals AI — TaxEval v23rd-party | unverified |
| 12 | GPT-5.4 OpenAI | 74% | Vals AI — TaxEval v23rd-party | unverified |
| 13 | Gemini 3 Flash Google DeepMind | 73.9% | Vals AI — TaxEval v23rd-party | unverified |
| 14 | Mistral Large 3 Mistral AI | 73.1% | Vals AI — TaxEval v23rd-party | unverified |
| 15 | Gemini 3.1 Pro Google DeepMind | 72.9% | Vals AI — TaxEval v23rd-party | unverified |
| 16 | Gemini 3 Pro Google DeepMind | 72.6% | Vals AI — TaxEval v23rd-party | unverified |
| 17 | DeepSeek V4-Pro DeepSeek | 72.1% | Vals AI — TaxEval v23rd-party | unverified |
| 18 | Kimi K2 Thinking Moonshot AI | 71.7% | Vals AI — TaxEval v23rd-party | unverified |
| 19 | Qwen3.6-27B Alibaba Qwen | 71.3% | Vals AI — TaxEval v23rd-party | unverified |
| 20 | GLM-5.1 Z.ai (Zhipu) | 71.2% | Vals AI — TaxEval v23rd-party | unverified |
| 21 | Grok 4.3 xAI | 70.8% | Vals AI — TaxEval v23rd-party | unverified |
| 22 | DeepSeek V3.2 DeepSeek | 68.2% | Vals AI — TaxEval v23rd-party | unverified |
| 23 | Claude Haiku 4.5 Anthropic | 67.5% | Vals AI — TaxEval v23rd-party | unverified |
| 24 | Llama 4 Maverick Meta AI | 66.6% | Vals AI — TaxEval v23rd-party | unverified |
| 25 | Llama 4 Scout Meta AI | 55.2% | Vals AI — TaxEval v23rd-party | unverified |
