Let's Verify

Study / Research

An earlier paper from 2021 that identified the problem of rewarding correct solutions obtained through flawed reasoning in models.

Mentioned in 1 video