Sweet Bench

ConceptMentioned in 2 videos

A benchmark used for evaluating reasoning capabilities of language models, where fine-tuning with reasoning data led to outperformance of OpenAI O1.