Let's Verify Step by Step
Study / Research
A paper and approach that focuses on verifying individual reasoning steps rather than just the final answer, a key predecessor to 01's training.
Mentioned in 2 videos
Videos Mentioning Let's Verify Step by Step

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
A paper by OpenAI that uses best-of-N sampling, rewarding each step in chain-of-thought reasoning to make the problem more specific.

o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know
AI Explained
A paper and approach that focuses on verifying individual reasoning steps rather than just the final answer, a key predecessor to 01's training.