Web of Lies
Study / Research
A benchmark where models can achieve 100% if trained on that specific reasoning task, highlighting potential brittleness.
Mentioned in 1 video
A benchmark where models can achieve 100% if trained on that specific reasoning task, highlighting potential brittleness.