evals

Concept

Short for evaluations, used to assess agent performance and identify failure modes like hallucination and output formatting issues. The speaker aims to make evals a source of joy rather than a pain by using MCP servers.

Mentioned in 1 video