Reasoninghigher is better

GPQA Diamond

Graduate-level, Google-proof science questions (physics, chemistry, biology) written by domain experts to resist web lookup.

Benchmark source

Domain: Reasoning
Metric: %
Orientation: Higher is better
Results: 23

Ranking

#	Model	Score	Source	Status
1	Gemini 3.1 Pro Google DeepMind	94.3%	Google DeepMind — Gemini 3.1 Pro model cardvendor	unverified
2	Claude Opus 4.7 Anthropic	94.2%	Anthropic — Claude Opus 4.7vendor	unverified
3	Claude Opus 4.8 Anthropic	93.6%	Anthropic — Claude Opus 4.8vendor	unverified
4	GPT-5.5 OpenAI	93.6%	OpenAI — GPT-5.5vendor	unverified
5	Qwen3.7 Max Alibaba Qwen	92.4%	Qwen — Qwen3.7 Maxvendor	unverified
6	GPT-5.2 OpenAI	92.4%	llm-stats — GPT-5.2 (vendor-reported)3rd-party	unverified
7	Gemini 3 Pro Google DeepMind	91.9%	Google — Gemini 3 Provendor	unverified
8	Claude Opus 4.6 Anthropic	91.3%	Anthropic — Claude Opus 4.6vendor	unverified
9	Kimi K2.6 Moonshot AI	90.5%	Moonshot — Kimi K2.6 model cardvendor	unverified
10	Gemini 3 Flash Google DeepMind	90.4%	Google — Gemini 3 Flashvendor	unverified
11	DeepSeek V4-Pro DeepSeek	90.1%	DeepSeek — V4-Pro model cardvendor	unverified
12	DeepSeek V4-Flash DeepSeek	88.1%	DeepSeek — V4-Flash model cardvendor	unverified
13	Qwen3.6-27B Alibaba Qwen	87.8%	Alibaba — Qwen3.6-27B model cardvendor	unverified
14	Nemotron 3 Ultra NVIDIA	87%	NVIDIA — Nemotron 3 Ultra model cardvendor	unverified
15	Gemini 2.5 Pro Google DeepMind	86.4%	Google DeepMind — Gemini 2.5 Pro model cardvendor	unverified
16	GLM-5.1 Z.ai (Zhipu)	86.2%	Zhipu / Z.ai — GLM-5.1 model cardvendor	unverified
17	Qwen3.6-35B-A3B Alibaba Qwen	86%	Alibaba — Qwen3.6-35B-A3B model cardvendor	unverified
18	Kimi K2 Thinking Moonshot AI	84.5%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
19	DeepSeek V3.2 DeepSeek	82.4%	DeepSeek — V3.2 technical reportvendor	unverified
20	Nova 2 Pro Amazon	81.4%	Amazon — Nova 2 technical reportvendor	unverified
21	Llama 4 Maverick Meta AI	69.8%	Meta — Llama 4vendor	unverified
22	Llama 4 Scout Meta AI	57.2%	Meta — Llama 4vendor	unverified
23	Mistral Large 3 Mistral AI	43.9%	Artificial Analysis — Mistral Large 33rd-party	unverified