Industry3 benchmarks

Corporate Law

Legal reasoning — issue spotting, rule application, and contract analysis — plus the broad knowledge a generalist counsel needs.

The Corporate Lawscore is the mean of a model’s normalized 0–100 scores (direction-aware, so lower-is-better metrics are inverted) across the 3 benchmarks below — the same figure the leaderboard’s industry view ranks by.

Leaders

Gemini 3.1 Pro leads this industry with a score of 70.0.

Gemini 3.1 Pro70.0

Claude Opus 4.768.0

DeepSeek V4-Pro67.9

Claude Opus 4.867.3

GPT-5.565.0

Gemini 3 Pro62.2

Qwen3.7 Max61.6

Claude Opus 4.661.0

DDoubao Seed 2.0 Pro60.0

Muse Spark58.5

Gemini 3 Flash58.1

Gemini 3.5 Flash57.5

DeepSeek V4-Flash57.0

GPT-5.457.0

Kimi K2.654.3

Grok 4.353.7

Nemotron 3 Ultra50.3

GPT-5.250.0

GLM-5.149.9

Claude Sonnet 4.648.5

Claude Haiku 4.546.6

Kimi K2 Thinking46.3

Qwen3.6-27B46.0

Nova 2 Pro45.1

Mistral Large 342.0

Llama 4 Maverick41.5

Qwen3.6-35B-A3B40.9

DeepSeek V3.239.7

Grok 4.2039.0

Gemini 2.5 Pro26.9

Llama 4 Scout13.3

Benchmarks in this score

Each model’s scores on these are normalized and averaged to produce the industry score above.

Law%

LegalBench

Legal-reasoning task suite (originated by Stanford CodeX), run independently by Vals AI and reported as overall accuracy across tasks.

Knowledge%

MMLU-Pro

A harder, cleaned-up successor to MMLU spanning 57+ subjects with 10-way multiple choice and reasoning-heavy items.

Frontier%

Humanity's Last Exam

A broad, extremely difficult exam across math, humanities, and science designed to remain unsaturated by frontier models. Reported here without external tools.