Gaia benchmark
Software / AppMentioned in 2 videos
A benchmark created a year prior that was challenging for agents, showcasing significant progress in recent months.
Videos Mentioning Gaia benchmark

smol agents are all you need
Latent Space
General AI Assistance Benchmark, designed to evaluate the generality and capability of AI agents in real-world tasks.

AI Dev 25 | Panel Discussion: Building AI Application in 2025
DeepLearningAI
A benchmark created a year prior that was challenging for agents, showcasing significant progress in recent months.