PPO

Software / App

Proximal Policy Optimization, a reinforcement learning algorithm developed by John Schulman, which OpenAI scaled up significantly for the Dota project, revealing emergent behaviors at larger scales.

Mentioned in 2 videos

Save the 2 videos on PPO to your own pod.

Get Started Free

Videos Mentioning PPO

Greg Brockman: OpenAI and AGI | Lex Fridman Podcast #17

Lex Fridman

Proximal Policy Optimization, a reinforcement learning algorithm developed by John Schulman, which OpenAI scaled up significantly for the Dota project, revealing emergent behaviors at larger scales.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training

Stanford Online

Proximal Policy Optimization, a reinforcement learning algorithm that aims to improve stability by using a clipping heuristic to discourage large policy changes. It's a key algorithm in RLHF.