evals
Concept
Short for evaluations, used to assess agent performance and identify failure modes like hallucination and output formatting issues. The speaker aims to make evals a source of joy rather than a pain by using MCP servers.
Mentioned in 1 video
