Benchmarks
Medicinehigher is better
Medical Coding
Vals AI MedCode — accuracy of ICD-10-CM diagnosis coding for the medical billing process. Independent, expert-built dataset.
Benchmark source- Domain
- Medicine
- Metric
- %
- Orientation
- Higher is better
- Results
- 21
Ranking
| # | Model | Score | Source | Status |
|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Google DeepMind | 59.1% | Vals AI — MedCode3rd-party | unverified |
| 2 | Gemini 3 Flash Google DeepMind | 55.9% | Vals AI — MedCode3rd-party | unverified |
| 3 | Gemini 3.5 Flash Google DeepMind | 55.8% | Vals AI — MedCode3rd-party | unverified |
| 4 | Claude Opus 4.7 Anthropic | 54.9% | Vals AI — MedCode3rd-party | unverified |
| 5 | Claude Opus 4.8 Anthropic | 53.2% | Vals AI — MedCode3rd-party | unverified |
| 6 | Gemini 3 Pro Google DeepMind | 52.2% | Vals AI — MedCode3rd-party | unverified |
| 7 | Muse Spark Meta AI | 51.3% | Vals AI — MedCode3rd-party | unverified |
| 8 | Gemini 2.5 Pro Google DeepMind | 50.6% | Vals AI — MedCode3rd-party | unverified |
| 9 | GPT-5.2 OpenAI | 49.7% | Vals AI — MedCode3rd-party | unverified |
| 10 | Claude Opus 4.6 Anthropic | 49.1% | Vals AI — MedCode3rd-party | unverified |
| 11 | GPT-5.5 OpenAI | 49.1% | Vals AI — MedCode3rd-party | unverified |
| 12 | GLM-5.1 Z.ai (Zhipu) | 41.6% | Vals AI — MedCode3rd-party | unverified |
| 13 | GPT-5.4 OpenAI | 41.3% | Vals AI — MedCode3rd-party | unverified |
| 14 | DeepSeek V4-Pro DeepSeek | 40.5% | Vals AI — MedCode3rd-party | unverified |
| 15 | Kimi K2.6 Moonshot AI | 40.1% | Vals AI — MedCode3rd-party | unverified |
| 16 | Qwen3.7 Max Alibaba Qwen | 38.8% | Vals AI — MedCode3rd-party | unverified |
| 17 | Grok 4.3 xAI | 38.1% | Vals AI — MedCode3rd-party | unverified |
| 18 | Llama 4 Maverick Meta AI | 36.5% | Vals AI — MedCode3rd-party | unverified |
| 19 | Claude Haiku 4.5 Anthropic | 32.7% | Vals AI — MedCode3rd-party | unverified |
| 20 | Grok 4.20 xAI | 32.2% | Vals AI — MedCode3rd-party | unverified |
| 21 | Llama 4 Scout Meta AI | 23.3% | Vals AI — MedCode3rd-party | unverified |
