MLE bench

Software / App

Machine Learning Engineer bench from Deep Research or the GPT-4o system card, measuring progress towards model self-improvement.

Mentioned in 1 video