We’ve evaluated GPT 5.1 Codex Max on our coding benchmarks. It boasts a +9.5% performance boost on VibeCodeBench (#3), +1% performance on SWE Bench (#4), and a slight regression on Terminal Bench.
This is not the fastest model in the shed. It’s 3x slower on SWE Bench, 10x on VCB. This is a model for when you need the absolute best, not when you need it quickly.
Correlation isn’t causation…
…but the more suffixes we tack onto these models, the longer they seem to take. Excited to test 5.25 Codex Max Ultra High Supreme Edition
The model was run with reasoning high and verbosity medium through OpenAI’s API. Results on IOI and LCB will be released soon!