All benchmarks

leaderboard view · CHI = mean of the benchmarks below (normalized 0–100, direction-aware)

Reasoning, coding, knowledge, and multimodal understanding.

#Model
1
Claude Opus 4.7
AnthropicProprietary
77.3
94.2%
46.9%
87.6%
1,493
57
66.1%
85.3%
75.3%
54.9%
2
Claude Opus 4.8
AnthropicProprietary
76.4
93.6%
49.8%
88.6%
61
66.7%
83.6%
75.6%
53.2%
3
Gemini 3.1 Pro
Google DeepMindProprietary
75.9
94.3%
44.4%
80.6%
1,488
57
64.5%
87.4%
72.9%
59.1%
4
Claude Opus 4.6
AnthropicProprietary
74.7
91.3%
40%
80.8%
1,498
53
67%
85.3%
76%
49.1%
5
Qwen3.7 Max
Alibaba QwenProprietary
73.2
92.4%
41.4%
80.4%
91.6%
1,474
57
63.7%
84.9%
75.3%
38.8%
6
GPT-5.5
OpenAIProprietary
73.0
93.6%
41.4%
1,474
60
68.4%
86.5%
75%
49.1%
7
72.8
79.6%
1,471
44
65.3%
82.1%
77.1%
8
Gemini 3 Pro
Google DeepMindProprietary
71.1
91.9%
37.5%
76.2%
1,486
48
63.7%
87%
72.6%
52.2%
9
DeepSeek V4-Pro
DeepSeekOpen weights
70.7
90.1%
37.7%
80.6%
93.5%
87.5%
1,457
52
61.4%
80.3%
72.1%
40.5%
10
Kimi K2.6
Moonshot AIOpen weights
70.5
90.5%
34.7%
80.2%
89.6%
1,462
54
66.7%
84.7%
74.7%
40.1%
11
GPT-5.4
OpenAIProprietary
70.4
1,467
57
65.3%
86%
74%
41.3%
12
Qwen3.6-27B
Alibaba QwenOpen weights
70.1
87.8%
24%
77.2%
83.9%
86.2%
46
62.3%
71.3%
13
Gemini 3.5 Flash
Google DeepMindProprietary
68.8
40.2%
1,477
55
64.7%
83.6%
74.4%
55.8%
14
Qwen3.6-35B-A3B
Alibaba QwenOpen weights
68.0
86%
21.4%
73.4%
80.4%
85.2%
43
15
DeepSeek V4-Flash
DeepSeekOpen weights
67.9
88.1%
34.8%
79%
91.6%
86.2%
1,433
47
16
Gemini 3 Flash
Google DeepMindProprietary
66.8
90.4%
33.7%
78%
1,473
35
66.4%
86.9%
73.9%
55.9%
17
Muse Spark
Meta AIProprietary
66.5
39.9%
52
65.1%
84.2%
77.7%
51.3%
18
GLM-5.1
Z.ai (Zhipu)Open weights
66.4
86.2%
31%
1,475
51
64.5%
84.4%
71.2%
41.6%
19
Grok 4.20
xAIProprietary
65.6
1,473
49
63.7%
77.7%
74.1%
32.2%
20
Kimi K2 Thinking
Moonshot AIOpen weights
65.5
84.5%
23.9%
71.3%
83.1%
84.6%
1,444
41
60.6%
80.2%
71.7%
21
GPT-5.2
OpenAIProprietary
65.3
92.4%
34.5%
80%
1,435
51
65.9%
82.8%
75.8%
49.7%
22
Grok 4.3
xAIProprietary
64.2
1,446
53
68.5%
84.5%
70.8%
38.1%
23
Nemotron 3 Ultra
NVIDIAOpen weights
63.8
87%
26.7%
71.9%
89%
86.8%
1,422
48
24
Nova 2 Pro
AmazonProprietary
63.6
81.4%
61.5%
74.6%
81.6%
23
25
DeepSeek V3.2
DeepSeekOpen weights
61.2
82.4%
25.1%
73.1%
83.3%
85%
1,437
32
51%
76.1%
68.2%
26
Gemini 2.5 Pro
Google DeepMindProprietary
54.1
86.4%
21.6%
59.6%
69%
1,446
35
60.8%
50.6%
27
Llama 4 Maverick
Meta AIOpen weights
54.1
69.8%
43.4%
80.5%
18
49.7%
77.8%
66.6%
36.5%
28
Claude Haiku 4.5
AnthropicProprietary
50.2
73.3%
1,411
31
60.6%
81.2%
67.5%
32.7%
29
Mistral Large 3
Mistral AIOpen weights
47.4
43.9%
1,418
23
61%
79.1%
73.1%
30
Llama 4 Scout
Meta AIOpen weights
45.2
57.2%
32.8%
74.3%
14
46.8%
72%
55.2%
23.3%
31
54.2%
1,455
32
MiniMax M3
MProprietary
1,449
55
Category bestNormalized 0–100 (direction-aware)Click a column to sort. All figures are sourced but unverified.