Leaderboard
Language ModelsOpen weights
Kimi K2 Thinking
Moonshot's Nov-2025 long-horizon reasoning MoE (1T total / 32B active), Modified MIT.
Crosshair Index
65.5
#20 of 32 · Language Models
- Provider
- Moonshot AI
- Released
- 2025-11-06
- Parameters
- 1000B
- Context
- 262.144K tokens
Token pricing
- Input
- $0.6 /1M
- Output
- $2.5 /1M
- Cache read
- $0.15 /1M
USD per 1M tokens · cache read = cached input (hit), cache write = caching surcharge · official list pricing (June 2026).
Industry skill web
Professional-domain strengths, composed from the benchmarks relevant to each field. Highlight an axis to see the benchmarks behind it.
Medicine
85
skill
Clinical knowledge and diagnostic reasoning, including the medical coding accuracy and science depth that real practice demands.
Scorecard
| Benchmark | Score | Source | Status |
|---|---|---|---|
| GPQA Diamond Reasoning | 84.5% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| Humanity's Last Exam Frontier | 23.9% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| SWE-bench Verified Agentic Coding | 71.3% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| LiveCodeBench Coding | 83.1% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| MMLU-Pro Knowledge | 84.6% | Moonshot — Kimi K2 Thinking model cardvendor | unverified |
| LMArena Elo Human Preference | 1,444 | LMArena (arena.ai)3rd-party | unverified |
| AA Intelligence Index Composite | 41 | Artificial Analysis3rd-party | unverified |
| Corporate Finance Finance | 60.6% | Vals AI — CorpFin v23rd-party | unverified |
| LegalBench Law | 80.2% | Vals AI — LegalBench3rd-party | unverified |
| TaxEval Tax & Accounting | 71.7% | Vals AI — TaxEval v23rd-party | unverified |
| Medical Coding Medicine | — | not evaluated |
