Language ModelsProprietary

Gemini 3.1 Pro

Google's most capable Gemini (preview); tops several reasoning benchmarks.

textimageaudiovideocodeOfficial site

Crosshair Index

78.4

#2 of 32 · Language Models

Token pricing

USD per 1M tokens · cache read = cached input (hit), cache write = caching surcharge · official list pricing (June 2026).

Professional-domain strengths, composed from the benchmarks relevant to each field. Highlight an axis to see the benchmarks behind it.

skill

Clinical knowledge and diagnostic reasoning, including the medical coding accuracy and science depth that real practice demands.

Benchmark	Score	Source	Status
GPQA Diamond Reasoning	94.3%best	Google DeepMind — Gemini 3.1 Pro model cardvendor	unverified
Humanity's Last Exam Frontier	44.4%	Google DeepMind — Gemini 3.1 Pro model cardvendor	unverified
SWE-bench Verified Agentic Coding	80.6%	Google DeepMind — Gemini 3.1 Pro model cardvendor	unverified
LiveCodeBench Coding	—	not evaluated
MMLU-Pro Knowledge	—	not evaluated
LMArena Elo Human Preference	1,488	LMArena (arena.ai)3rd-party	unverified
AA Intelligence Index Composite	57	Artificial Analysis3rd-party	unverified
Corporate Finance Finance	64.5%	Vals AI — CorpFin v23rd-party	unverified
LegalBench Law	87.4%best	Vals AI — LegalBench3rd-party	unverified
TaxEval Tax & Accounting	72.9%	Vals AI — TaxEval v23rd-party	unverified
Medical Coding Medicine	59.1%best	Vals AI — MedCode3rd-party	unverified