Let's Verify
Study / ResearchMentioned in 1 video
An earlier paper from 2021 that identified the problem of rewarding correct solutions obtained through flawed reasoning in models.
An earlier paper from 2021 that identified the problem of rewarding correct solutions obtained through flawed reasoning in models.