Agentic Coding Swebench Pro

Software / App

A benchmark used to evaluate AI models on coding tasks. GPT-5.5 underperforms on this benchmark compared to Opus 4.7 and Mythos Preview.

Mentioned in 1 video