Crosshair Intelligence

All benchmarks

leaderboard view · CHI = mean of the benchmarks below (normalized 0–100, direction-aware)

Reasoning, coding, knowledge, and multimodal understanding.

GPQA Diamond Humanity's Last Exam SWE-bench Verified LiveCodeBench MMLU-Pro LMArena Elo AA Intelligence Index Corporate Finance LegalBench TaxEval Medical Coding

#	Model
1	Claude Opus 4.7 Proprietary	81.2	94.2%	46.9%	87.6%	—	—	1,493	57	66.1%	85.3%	75.3%	54.9%
2	Gemini 3.1 Pro Proprietary	78.4	94.3%	44.4%	80.6%	—	—	1,488	57	64.5%	87.4%	72.9%	59.1%
3	Claude Opus 4.8 Proprietary	76.9	93.6%	49.8%	88.6%	—	—	—	61	66.7%	83.6%	75.6%	53.2%
4	Claude Opus 4.6 Proprietary	75.5	91.3%	40%	80.8%	—	—	1,498	53	67%	85.3%	76%	49.1%
5	Qwen3.7 Max Proprietary	74.9	92.4%	41.4%	80.4%	91.6%	—	1,474	57	63.7%	84.9%	75.3%	38.8%
6	GPT-5.5 Proprietary	73.1	93.6%	41.4%	—	—	—	1,474	60	68.4%	86.5%	75%	49.1%
7	DeepSeek V4-Pro Open weights	72.0	90.1%	37.7%	80.6%	93.5%	87.5%	1,457	52	61.4%	80.3%	72.1%	40.5%
8	Kimi K2.6 Open weights	71.7	90.5%	34.7%	80.2%	89.6%	—	1,462	54	66.7%	84.7%	74.7%	40.1%
9	Gemini 3 Pro Proprietary	70.3	91.9%	37.5%	76.2%	—	—	1,486	48	63.7%	87%	72.6%	52.2%
10	Gemini 3 Flash Proprietary	68.2	90.4%	33.7%	78%	—	—	1,473	35	66.4%	86.9%	73.9%	55.9%
11	GPT-5.2 Proprietary	65.1	92.4%	34.5%	80%	—	—	1,435	51	65.9%	82.8%	75.8%	49.7%
12	Gemini 3.5 Flash Proprietary	64.9	—	40.2%	—	—	—	1,477	55	64.7%	83.6%	74.4%	55.8%
13	Muse Spark Proprietary	61.7	—	39.9%	—	—	—	—	52	65.1%	84.2%	77.7%	51.3%
14	GLM-5.1 Open weights	60.9	86.2%	31%	—	—	—	1,475	51	64.5%	84.4%	71.2%	41.6%
15	GPT-5.4 Proprietary	60.5	—	—	—	—	—	1,467	57	65.3%	86%	74%	41.3%
16	Qwen3.6-27B Open weights	60.1	87.8%	24%	77.2%	83.9%	86.2%	—	46	62.3%	—	71.3%	—
17	Claude Sonnet 4.6 Proprietary	59.1	—	—	79.6%	—	—	1,471	44	65.3%	82.1%	77.1%	—
18	DeepSeek V4-Flash Open weights	58.0	88.1%	34.8%	79%	91.6%	86.2%	1,433	47	—	—	—	—
19	Grok 4.3 Proprietary	55.9	—	—	—	—	—	1,446	53	68.5%	84.5%	70.8%	38.1%
20	Kimi K2 Thinking Open weights	55.9	84.5%	23.9%	71.3%	83.1%	84.6%	1,444	41	60.6%	80.2%	71.7%	—
21	Nemotron 3 Ultra Open weights	52.4	87%	26.7%	71.9%	89%	86.8%	1,422	48	—	—	—	—
22	Grok 4.20 Proprietary	51.8	—	—	—	—	—	1,473	49	63.7%	77.7%	74.1%	32.2%
23	Qwen3.6-35B-A3B Open weights	50.3	86%	21.4%	73.4%	80.4%	85.2%	—	43	—	—	—	—
24	DeepSeek V3.2 Open weights	46.4	82.4%	25.1%	73.1%	83.3%	85%	1,437	32	51%	76.1%	68.2%	—
25	Gemini 2.5 Pro Proprietary	44.6	86.4%	21.6%	59.6%	69%	—	1,446	35	60.8%	—	—	50.6%
26	Nova 2 Pro Proprietary	42.2	81.4%	—	61.5%	74.6%	81.6%	—	23	—	—	—	—
27	Claude Haiku 4.5 Proprietary	40.7	—	—	73.3%	—	—	1,411	31	60.6%	81.2%	67.5%	32.7%
28	Mistral Large 3 Open weights	38.0	43.9%	—	—	—	—	1,418	23	61%	79.1%	73.1%	—
29	Llama 4 Maverick Open weights	34.8	69.8%	—	—	43.4%	80.5%	—	18	49.7%	77.8%	66.6%	36.5%
30	Llama 4 Scout Open weights	13.3	57.2%	—	—	32.8%	74.3%	—	14	46.8%	72%	55.2%	23.3%
31	Doubao Seed 2.0 Pro DProprietary	—	—	54.2%	—	—	—	1,455	—	—	—	—	—
32	MiniMax M3 MProprietary	—	—	—	—	—	—	1,449	55	—	—	—	—

Category bestNormalized 0–100 (direction-aware)Click a column to sort. All figures are sourced but unverified.