PPO
Software / App
Proximal Policy Optimization, a reinforcement learning algorithm developed by John Schulman, which OpenAI scaled up significantly for the Dota project, revealing emergent behaviors at larger scales.
Mentioned in 2 videos
Save the 2 videos on PPO to your own pod.
Sign up free to keep building your knowledge base on PPO as more episodes are added.
Videos Mentioning PPO

Greg Brockman: OpenAI and AGI | Lex Fridman Podcast #17
Lex Fridman
Proximal Policy Optimization, a reinforcement learning algorithm developed by John Schulman, which OpenAI scaled up significantly for the Dota project, revealing emergent behaviors at larger scales.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training
Stanford Online
Proximal Policy Optimization, a reinforcement learning algorithm that aims to improve stability by using a clipping heuristic to discourage large policy changes. It's a key algorithm in RLHF.