DPO
Concept
Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.
Mentioned in 2 videos
Videos Mentioning DPO

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
Latent Space
A reinforcement learning algorithm, compared to GRPO, with GRPO being described as 'DPO on steroids' and offering advantages in online learning and batch processing.

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)
Latent Space
Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.