Crosshair
Benchmarks
Reasoninghigher is better

GPQA Diamond

Graduate-level, Google-proof science questions (physics, chemistry, biology) written by domain experts to resist web lookup.

Benchmark source
Domain
Reasoning
Metric
%
Orientation
Higher is better
Results
23

Ranking

#ModelScoreSourceStatus
1Gemini 3.1 Pro
Google DeepMind
94.3%Google DeepMind — Gemini 3.1 Pro model cardvendorunverified
2Claude Opus 4.7
Anthropic
94.2%Anthropic — Claude Opus 4.7vendorunverified
3Claude Opus 4.8
Anthropic
93.6%Anthropic — Claude Opus 4.8vendorunverified
4GPT-5.5
OpenAI
93.6%OpenAI — GPT-5.5vendorunverified
5Qwen3.7 Max
Alibaba Qwen
92.4%Qwen — Qwen3.7 Maxvendorunverified
6GPT-5.2
OpenAI
92.4%llm-stats — GPT-5.2 (vendor-reported)3rd-partyunverified
7Gemini 3 Pro
Google DeepMind
91.9%Google — Gemini 3 Provendorunverified
8Claude Opus 4.6
Anthropic
91.3%Anthropic — Claude Opus 4.6vendorunverified
9Kimi K2.6
Moonshot AI
90.5%Moonshot — Kimi K2.6 model cardvendorunverified
10Gemini 3 Flash
Google DeepMind
90.4%Google — Gemini 3 Flashvendorunverified
11DeepSeek V4-Pro
DeepSeek
90.1%DeepSeek — V4-Pro model cardvendorunverified
12DeepSeek V4-Flash
DeepSeek
88.1%DeepSeek — V4-Flash model cardvendorunverified
13Qwen3.6-27B
Alibaba Qwen
87.8%Alibaba — Qwen3.6-27B model cardvendorunverified
14Nemotron 3 Ultra
NVIDIA
87%NVIDIA — Nemotron 3 Ultra model cardvendorunverified
15Gemini 2.5 Pro
Google DeepMind
86.4%Google DeepMind — Gemini 2.5 Pro model cardvendorunverified
16GLM-5.1
Z.ai (Zhipu)
86.2%Zhipu / Z.ai — GLM-5.1 model cardvendorunverified
17Qwen3.6-35B-A3B
Alibaba Qwen
86%Alibaba — Qwen3.6-35B-A3B model cardvendorunverified
18Kimi K2 Thinking
Moonshot AI
84.5%Moonshot — Kimi K2 Thinking model cardvendorunverified
19DeepSeek V3.2
DeepSeek
82.4%DeepSeek — V3.2 technical reportvendorunverified
20Nova 2 Pro
Amazon
81.4%Amazon — Nova 2 technical reportvendorunverified
21Llama 4 Maverick
Meta AI
69.8%Meta — Llama 4vendorunverified
22Llama 4 Scout
Meta AI
57.2%Meta — Llama 4vendorunverified
23Mistral Large 3
Mistral AI
43.9%Artificial Analysis — Mistral Large 33rd-partyunverified