W
Web of Lies
Study / ResearchMentioned in 1 video
A benchmark where models can achieve 100% if trained on that specific reasoning task, highlighting potential brittleness.
A benchmark where models can achieve 100% if trained on that specific reasoning task, highlighting potential brittleness.