Let's Verify

Study / ResearchMentioned in 1 video

An earlier paper from 2021 that identified the problem of rewarding correct solutions obtained through flawed reasoning in models.