Gaia benchmark
Software / App
A benchmark created a year prior that was challenging for agents, showcasing significant progress in recent months.
Mentioned in 2 videos
Videos Mentioning Gaia benchmark

smol agents are all you need
Latent Space
General AI Assistance Benchmark, designed to evaluate the generality and capability of AI agents in real-world tasks.

AI Dev 25 | Panel Discussion: Building AI Application in 2025
DeepLearningAI
A benchmark created a year prior that was challenging for agents, showcasing significant progress in recent months.