S

Swedbench

Study / ResearchMentioned in 1 video

A benchmark score published by Anthropic, which shows the improvement in agent performance when using sampling techniques (from 70% to 80%).