Benchmarks
Human Preferencehigher is better
LMArena Elo
Crowd-sourced pairwise preference rating from blind head-to-head chats (LMArena, formerly LMSYS). Unbounded; ~1000 is the historical anchor.
Benchmark source- Domain
- Human Preference
- Metric
- elo
- Orientation
- Higher is better
- Results
- 25
Ranking
| # | Model | Score | Source | Status |
|---|---|---|---|---|
| 1 | Claude Opus 4.6 Anthropic | 1,498 | LMArena (arena.ai)3rd-party | unverified |
| 2 | Claude Opus 4.7 Anthropic | 1,493 | LMArena (arena.ai)3rd-party | unverified |
| 3 | Gemini 3.1 Pro Google DeepMind | 1,488 | LMArena (arena.ai)3rd-party | unverified |
| 4 | Gemini 3 Pro Google DeepMind | 1,486 | LMArena (arena.ai)3rd-party | unverified |
| 5 | Gemini 3.5 Flash Google DeepMind | 1,477 | LMArena (arena.ai)3rd-party | unverified |
| 6 | GLM-5.1 Z.ai (Zhipu) | 1,475 | LMArena (arena.ai)3rd-party | unverified |
| 7 | Qwen3.7 Max Alibaba Qwen | 1,474 | LMArena (arena.ai)3rd-party | unverified |
| 8 | GPT-5.5 OpenAI | 1,474 | LMArena (arena.ai)3rd-party | unverified |
| 9 | Gemini 3 Flash Google DeepMind | 1,473 | LMArena (arena.ai)3rd-party | unverified |
| 10 | Grok 4.20 xAI | 1,473 | LMArena (arena.ai)3rd-party | unverified |
| 11 | Claude Sonnet 4.6 Anthropic | 1,471 | LMArena (arena.ai)3rd-party | unverified |
| 12 | GPT-5.4 OpenAI | 1,467 | LMArena (arena.ai)3rd-party | unverified |
| 13 | Kimi K2.6 Moonshot AI | 1,462 | LMArena (arena.ai)3rd-party | unverified |
| 14 | DeepSeek V4-Pro DeepSeek | 1,457 | LMArena (arena.ai)3rd-party | unverified |
| 15 | Doubao Seed 2.0 Pro ByteDance | 1,455 | LMArena (arena.ai)3rd-party | unverified |
| 16 | MiniMax M3 MiniMax | 1,449 | LMArena (arena.ai)3rd-party | unverified |
| 17 | Grok 4.3 xAI | 1,446 | LMArena (arena.ai)3rd-party | unverified |
| 18 | Gemini 2.5 Pro Google DeepMind | 1,446 | LMArena (arena.ai)3rd-party | unverified |
| 19 | Kimi K2 Thinking Moonshot AI | 1,444 | LMArena (arena.ai)3rd-party | unverified |
| 20 | DeepSeek V3.2 DeepSeek | 1,437 | LMArena (arena.ai)3rd-party | unverified |
| 21 | GPT-5.2 OpenAI | 1,435 | LMArena (arena.ai)3rd-party | unverified |
| 22 | DeepSeek V4-Flash DeepSeek | 1,433 | LMArena (arena.ai)3rd-party | unverified |
| 23 | Nemotron 3 Ultra NVIDIA | 1,422 | LMArena (arena.ai)3rd-party | unverified |
| 24 | Mistral Large 3 Mistral AI | 1,418 | LMArena (arena.ai)3rd-party | unverified |
| 25 | Claude Haiku 4.5 Anthropic | 1,411 | LMArena (arena.ai)3rd-party | unverified |
