Swedbench

Study / Research

A benchmark score published by Anthropic, which shows the improvement in agent performance when using sampling techniques (from 70% to 80%).

Mentioned in 1 video