humanity's last exam

Software / App

A benchmark designed to test AI models on a comprehensive set of intense tasks, with the implication that passing them signifies AGI.

Mentioned in 2 videos