Crosshair
Benchmarks
Industry2 benchmarks

Software Engineering

Shipping working code against real repositories: bug fixes, feature patches, and competitive programming under tests.

The Software Engineeringscore is the mean of a model’s normalized 0–100 scores (direction-aware, so lower-is-better metrics are inverted) across the 2 benchmarks below — the same figure the leaderboard’s industry view ranks by.

Leaders

Claude Opus 4.8 leads this industry with a score of 88.6.

Benchmarks in this score

Each model’s scores on these are normalized and averaged to produce the industry score above.