Honeycomb

Software / App

A high-scoring agent on the SWE-Bench full dataset, which first attempts to reproduce a bug before executing actions like running bash commands.

Mentioned in 2 videos