Language ModelsOpen weights

Kimi K2 Thinking

Moonshot's Nov-2025 long-horizon reasoning MoE (1T total / 32B active), Modified MIT.

Crosshair Index

55.9

#20 of 32 · Language Models

Token pricing

USD per 1M tokens · cache read = cached input (hit), cache write = caching surcharge · official list pricing (June 2026).

Professional-domain strengths, composed from the benchmarks relevant to each field. Highlight an axis to see the benchmarks behind it.

skill

Financial analysis over filings and credit agreements — valuation math, document QA, and the quantitative reasoning behind deals.

Benchmark	Score	Source	Status
GPQA Diamond Reasoning	84.5%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
Humanity's Last Exam Frontier	23.9%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
SWE-bench Verified Agentic Coding	71.3%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
LiveCodeBench Coding	83.1%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
MMLU-Pro Knowledge	84.6%	Moonshot — Kimi K2 Thinking model cardvendor	unverified
LMArena Elo Human Preference	1,444	LMArena (arena.ai)3rd-party	unverified
AA Intelligence Index Composite	41	Artificial Analysis3rd-party	unverified
Corporate Finance Finance	60.6%	Vals AI — CorpFin v23rd-party	unverified
LegalBench Law	80.2%	Vals AI — LegalBench3rd-party	unverified
TaxEval Tax & Accounting	71.7%	Vals AI — TaxEval v23rd-party	unverified
Medical Coding Medicine	—	not evaluated