Benchmarks
Knowledgehigher is better
MMLU-Pro
A harder, cleaned-up successor to MMLU spanning 57+ subjects with 10-way multiple choice and reasoning-heavy items.
Benchmark source- Domain
- Knowledge
- Metric
- %
- Orientation
- Higher is better
- Results
- 10
Ranking
| # | Model | Score | Source | Status |
|---|---|---|---|---|
| 1 | DeepSeek V4-Pro DeepSeek | 87.5% | DeepSeek — V4-Pro model cardvendor | unverified |
| 2 | Nemotron 3 Ultra NVIDIA | 86.8% | NVIDIA — Nemotron 3 Ultra model cardvendor | unverified |
| 3 | Qwen3.6-27B Alibaba Qwen | 86.2% | Alibaba — Qwen3.6-27B model cardvendor | unverified |
| 4 | DeepSeek V4-Flash DeepSeek | 86.2% | DeepSeek — V4-Flash model cardvendor | unverified |
| 5 | Qwen3.6-35B-A3B Alibaba Qwen | 85.2% | Alibaba — Qwen3.6-35B-A3B model cardvendor | unverified |
| 6 | DeepSeek V3.2 DeepSeek | 85% | DeepSeek — V3.2 technical reportvendor | unverified |
| 7 | Kimi K2 Thinking Moonshot AI | 84.6% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| 8 | Nova 2 Pro Amazon | 81.6% | Amazon — Nova 2 technical reportvendor | unverified |
| 9 | Llama 4 Maverick Meta AI | 80.5% | Meta — Llama 4vendor | unverified |
| 10 | Llama 4 Scout Meta AI | 74.3% | Meta — Llama 4vendor | unverified |
