Benchmarks
Industry3 benchmarks
Corporate Law
Legal reasoning — issue spotting, rule application, and contract analysis — plus the broad knowledge a generalist counsel needs.
The Corporate Lawscore is the mean of a model’s normalized 0–100 scores (direction-aware, so lower-is-better metrics are inverted) across the 3 benchmarks below — the same figure the leaderboard’s industry view ranks by.
Leaders
GPT-5.4 leads this industry with a score of 86.0.
Benchmarks in this score
Each model’s scores on these are normalized and averaged to produce the industry score above.
Law%
LegalBench
Legal-reasoning task suite (originated by Stanford CodeX), run independently by Vals AI and reported as overall accuracy across tasks.
Knowledge%
MMLU-Pro
A harder, cleaned-up successor to MMLU spanning 57+ subjects with 10-way multiple choice and reasoning-heavy items.
Frontier%
Humanity's Last Exam
A broad, extremely difficult exam across math, humanities, and science designed to remain unsaturated by frontier models. Reported here without external tools.
