Gaia

Concept

A benchmark designed to assess AI agents on reasoning, multimodal handling, web browsing, and tool proficiency. Manis scored exceptionally high on this benchmark, nearing human performance.

Mentioned in 1 video