Crosshair
Benchmarks
Medicinehigher is better

Medical Coding

Vals AI MedCode — accuracy of ICD-10-CM diagnosis coding for the medical billing process. Independent, expert-built dataset.

Benchmark source
Domain
Medicine
Metric
%
Orientation
Higher is better
Results
21

Ranking

#ModelScoreSourceStatus
1Gemini 3.1 Pro
Google DeepMind
59.1%Vals AI — MedCode3rd-partyunverified
2Gemini 3 Flash
Google DeepMind
55.9%Vals AI — MedCode3rd-partyunverified
3Gemini 3.5 Flash
Google DeepMind
55.8%Vals AI — MedCode3rd-partyunverified
4Claude Opus 4.7
Anthropic
54.9%Vals AI — MedCode3rd-partyunverified
5Claude Opus 4.8
Anthropic
53.2%Vals AI — MedCode3rd-partyunverified
6Gemini 3 Pro
Google DeepMind
52.2%Vals AI — MedCode3rd-partyunverified
7Muse Spark
Meta AI
51.3%Vals AI — MedCode3rd-partyunverified
8Gemini 2.5 Pro
Google DeepMind
50.6%Vals AI — MedCode3rd-partyunverified
9GPT-5.2
OpenAI
49.7%Vals AI — MedCode3rd-partyunverified
10Claude Opus 4.6
Anthropic
49.1%Vals AI — MedCode3rd-partyunverified
11GPT-5.5
OpenAI
49.1%Vals AI — MedCode3rd-partyunverified
12GLM-5.1
Z.ai (Zhipu)
41.6%Vals AI — MedCode3rd-partyunverified
13GPT-5.4
OpenAI
41.3%Vals AI — MedCode3rd-partyunverified
14DeepSeek V4-Pro
DeepSeek
40.5%Vals AI — MedCode3rd-partyunverified
15Kimi K2.6
Moonshot AI
40.1%Vals AI — MedCode3rd-partyunverified
16Qwen3.7 Max
Alibaba Qwen
38.8%Vals AI — MedCode3rd-partyunverified
17Grok 4.3
xAI
38.1%Vals AI — MedCode3rd-partyunverified
18Llama 4 Maverick
Meta AI
36.5%Vals AI — MedCode3rd-partyunverified
19Claude Haiku 4.5
Anthropic
32.7%Vals AI — MedCode3rd-partyunverified
20Grok 4.20
xAI
32.2%Vals AI — MedCode3rd-partyunverified
21Llama 4 Scout
Meta AI
23.3%Vals AI — MedCode3rd-partyunverified