Crosshair
Benchmarks
Human Preferencehigher is better

LMArena Elo

Crowd-sourced pairwise preference rating from blind head-to-head chats (LMArena, formerly LMSYS). Unbounded; ~1000 is the historical anchor.

Benchmark source
Domain
Human Preference
Metric
elo
Orientation
Higher is better
Results
25

Ranking

#ModelScoreSourceStatus
1Claude Opus 4.6
Anthropic
1,498LMArena (arena.ai)3rd-partyunverified
2Claude Opus 4.7
Anthropic
1,493LMArena (arena.ai)3rd-partyunverified
3Gemini 3.1 Pro
Google DeepMind
1,488LMArena (arena.ai)3rd-partyunverified
4Gemini 3 Pro
Google DeepMind
1,486LMArena (arena.ai)3rd-partyunverified
5Gemini 3.5 Flash
Google DeepMind
1,477LMArena (arena.ai)3rd-partyunverified
6GLM-5.1
Z.ai (Zhipu)
1,475LMArena (arena.ai)3rd-partyunverified
7Qwen3.7 Max
Alibaba Qwen
1,474LMArena (arena.ai)3rd-partyunverified
8GPT-5.5
OpenAI
1,474LMArena (arena.ai)3rd-partyunverified
9Gemini 3 Flash
Google DeepMind
1,473LMArena (arena.ai)3rd-partyunverified
10Grok 4.20
xAI
1,473LMArena (arena.ai)3rd-partyunverified
11Claude Sonnet 4.6
Anthropic
1,471LMArena (arena.ai)3rd-partyunverified
12GPT-5.4
OpenAI
1,467LMArena (arena.ai)3rd-partyunverified
13Kimi K2.6
Moonshot AI
1,462LMArena (arena.ai)3rd-partyunverified
14DeepSeek V4-Pro
DeepSeek
1,457LMArena (arena.ai)3rd-partyunverified
15Doubao Seed 2.0 Pro
ByteDance
1,455LMArena (arena.ai)3rd-partyunverified
16MiniMax M3
MiniMax
1,449LMArena (arena.ai)3rd-partyunverified
17Grok 4.3
xAI
1,446LMArena (arena.ai)3rd-partyunverified
18Gemini 2.5 Pro
Google DeepMind
1,446LMArena (arena.ai)3rd-partyunverified
19Kimi K2 Thinking
Moonshot AI
1,444LMArena (arena.ai)3rd-partyunverified
20DeepSeek V3.2
DeepSeek
1,437LMArena (arena.ai)3rd-partyunverified
21GPT-5.2
OpenAI
1,435LMArena (arena.ai)3rd-partyunverified
22DeepSeek V4-Flash
DeepSeek
1,433LMArena (arena.ai)3rd-partyunverified
23Nemotron 3 Ultra
NVIDIA
1,422LMArena (arena.ai)3rd-partyunverified
24Mistral Large 3
Mistral AI
1,418LMArena (arena.ai)3rd-partyunverified
25Claude Haiku 4.5
Anthropic
1,411LMArena (arena.ai)3rd-partyunverified