SWEBench
Study / Research
A software benchmark mentioned to illustrate that high performance on benchmarks (like Gemini on SWEBench) doesn't always translate to practical utility.
Mentioned in 1 video
A software benchmark mentioned to illustrate that high performance on benchmarks (like Gemini on SWEBench) doesn't always translate to practical utility.