Honeycomb
Software / App
A high-scoring agent on the SWE-Bench full dataset, which first attempts to reproduce a bug before executing actions like running bash commands.
Mentioned in 2 videos
Videos Mentioning Honeycomb
![[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu](https://i.ytimg.com/vi/ULcwHlxfSkQ/maxresdefault.jpg)
[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu
Latent Space
A high-scoring agent on the SWE-Bench full dataset, which first attempts to reproduce a bug before executing actions like running bash commands.

Production AI Engineering starts with Evals
Latent Space
An observability platform that built its own super wide column store. The speaker agrees with their decision given the lack of accessible semi-structured data solutions like Snowflake's variant type.