DeepMind's reinforcement learning milestone mentioned as part of RL scaling history.
Mentioned in 1 video
Dwarkesh Clips