Honeycomb

Software / AppMentioned in 2 videos

A high-scoring agent on the SWE-Bench full dataset, which first attempts to reproduce a bug before executing actions like running bash commands.