Gaia
Concept
A benchmark designed to assess AI agents on reasoning, multimodal handling, web browsing, and tool proficiency. Manis scored exceptionally high on this benchmark, nearing human performance.
Mentioned in 1 video
A benchmark designed to assess AI agents on reasoning, multimodal handling, web browsing, and tool proficiency. Manis scored exceptionally high on this benchmark, nearing human performance.