SWE-Bench
Tool / ProductMentioned in 2 videos
The original academic coding benchmark from a lab at Princeton, which SWE-Bench Verified was a cleaned-up version of.
Videos Mentioning SWE-Bench

The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals
Latent Space
The original academic coding benchmark from a lab at Princeton, which SWE-Bench Verified was a cleaned-up version of.

GPT 4.1: The New OpenAI Workhorse
Latent Space
An evaluation benchmark for AI models' ability to complete software engineering tasks, where GPT-4.1 showed significant improvements.