Crosshair
Leaderboard
Language ModelsOpen weights

Kimi K2 Thinking

Moonshot's Nov-2025 long-horizon reasoning MoE (1T total / 32B active), Modified MIT.

Crosshair Index
65.5
#20 of 32 · Language Models
Provider
Moonshot AI
Released
2025-11-06
Parameters
1000B
Context
262.144K tokens

Token pricing

Input
$0.6 /1M
Output
$2.5 /1M
Cache read
$0.15 /1M

USD per 1M tokens · cache read = cached input (hit), cache write = caching surcharge · official list pricing (June 2026).

Industry skill web

Professional-domain strengths, composed from the benchmarks relevant to each field. Highlight an axis to see the benchmarks behind it.

SoftwareIB / FinanceLawMedicineResearchConsultingAccounting

Medicine

85
skill

Clinical knowledge and diagnostic reasoning, including the medical coding accuracy and science depth that real practice demands.

Scorecard

BenchmarkScoreSourceStatus
GPQA Diamond
Reasoning
84.5%Moonshot — Kimi K2 Thinking model cardvendorunverified
Humanity's Last Exam
Frontier
23.9%Moonshot — Kimi K2 Thinking model cardvendorunverified
SWE-bench Verified
Agentic Coding
71.3%Moonshot — Kimi K2 Thinking model cardvendorunverified
LiveCodeBench
Coding
83.1%Moonshot — Kimi K2 Thinking model cardvendorunverified
MMLU-Pro
Knowledge
84.6%Moonshot — Kimi K2 Thinking model cardvendorunverified
LMArena Elo
Human Preference
1,444LMArena (arena.ai)3rd-partyunverified
AA Intelligence Index
Composite
41Artificial Analysis3rd-partyunverified
Corporate Finance
Finance
60.6%Vals AI — CorpFin v23rd-partyunverified
LegalBench
Law
80.2%Vals AI — LegalBench3rd-partyunverified
TaxEval
Tax & Accounting
71.7%Vals AI — TaxEval v23rd-partyunverified
Medical Coding
Medicine
not evaluated