Models
Every system on the board, language and world models alike. Select one for its full scorecard and sources.
Language Models
· 32Claude Opus 4.8
Anthropic
Anthropic's frontier model and the current intelligence leader; excels at agentic coding and long-horizon reasoning.
Claude Sonnet 4.6
Anthropic
Balanced speed/quality workhorse of the Claude 4 family.
Claude Haiku 4.5
Anthropic
Fast, low-cost Claude tier with near-frontier coding.
GPT-5.5
OpenAI
OpenAI's flagship multimodal reasoning model (figures shown for the high reasoning-effort config).
GPT-5.2
OpenAI
Prior OpenAI flagship; still available but scheduled for API retirement in Aug 2026.
Gemini 3.1 Pro
Google DeepMind
Google's most capable Gemini (preview); tops several reasoning benchmarks.
Gemini 3 Pro
Google DeepMind
Google DeepMind's frontier multimodal, long-context model.
Gemini 3 Flash
Google DeepMind
Fast, low-cost tier of the Gemini 3 line (preview).
DeepSeek V4-Pro
DeepSeek
Open-weights MoE flagship (1.6T total / 49B active) with built-in reasoning modes (MIT).
DeepSeek V4-Flash
DeepSeek
Efficient open-weights V4 tier (284B total / 13B active), MIT-licensed.
Kimi K2.6
Moonshot AI
Moonshot's trillion-param MoE (32B active); strong agentic coding (Modified MIT).
GLM-5.1
Z.ai (Zhipu)
Zhipu/Z.ai open-weights flagship (754B total / 40B active), MIT.
Qwen3.6-27B
Alibaba Qwen
Alibaba's dense open-weights Qwen3.6 (Apache-2.0); multimodal.
Llama 4 Maverick
Meta AI
Meta's open-weights MoE (~17B active); the current open Llama flagship.
Mistral Large 3
Mistral AI
Mistral's open-weights MoE flagship (41B active, Apache-2.0); a non-reasoning model.
Grok 4.3
xAI
xAI's current flagship; ranks high on the Artificial Analysis Index (xAI publishes few classic benchmarks).
Claude Opus 4.7
Anthropic
Anthropic's prior Opus flagship (superseded by 4.8); still strong on agentic coding.
Claude Opus 4.6
Anthropic
Earlier Opus 4.6 frontier model; still generally available.
GPT-5.4
OpenAI
OpenAI's GPT-5.4 — the tier between GPT-5.2 and GPT-5.5; reasoning + tool use.
Gemini 3.5 Flash
Google DeepMind
Google's current fast/default Gemini (GA); strong agentic performance.
Gemini 2.5 Pro
Google DeepMind
Prior-generation Gemini flagship; widely used, slated for deprecation.
Grok 4.20
xAI
xAI Grok 4.20 (reasoning); multi-agent 'council' architecture, very long context.
DeepSeek V3.2
DeepSeek
Open-weights MoE (671B total / 37B active), MIT; predecessor to V4.
Kimi K2 Thinking
Moonshot AI
Moonshot's Nov-2025 long-horizon reasoning MoE (1T total / 32B active), Modified MIT.
Qwen3.7 Max
Alibaba Qwen
Alibaba's proprietary Qwen3.7 Max flagship; agentic-focused.
Qwen3.6-35B-A3B
Alibaba Qwen
Open-weights hybrid-MoE Qwen3.6 (35B total / 3B active), Apache-2.0.
Muse Spark
Meta AI
Meta Superintelligence Labs' first proprietary frontier model (free via Meta AI).
Llama 4 Scout
Meta AI
Open-weights MoE (~17B active / 109B total) with very long context.
Nova 2 Pro
Amazon
Amazon's Nova 2 Pro flagship on Bedrock; hybrid reasoning, multimodal.
Nemotron 3 Ultra
NVIDIA
NVIDIA's open-weights hybrid Mamba-MoE (550B total / 55B active), OpenMDW license.
MiniMax M3
MiniMax
MiniMax's M3 flagship; tops the Artificial Analysis Intelligence Index (open weights pending).
Doubao Seed 2.0 Pro
ByteDance
ByteDance's Doubao Seed 2.0 Pro; multimodal flagship served via Volcano Engine.
World Models
emerging· 5V-JEPA 2
Meta AI
Self-supervised video joint-embedding predictive architecture; learns world dynamics for prediction and robot planning.
Genie 3
Google DeepMind
Foundation world model that generates interactive, controllable environments in real time.
Cosmos Predict
NVIDIA
World Foundation Models for physical AI — future-frame prediction for robotics and autonomous systems.
Large World Model
World Labs
Spatially-grounded generative world model producing persistent, explorable 3D scenes.
Kona
Logical Intelligence
Logical Intelligence's world model — tracked here; standardized results aren't available yet.
