PO
ConceptMentioned in 2 videos
Policy Optimization; traditional training approach using a large teacher model to critique data.
Videos Mentioning PO

New DeepSeek Research - The Future Is Here!
Two Minute Papers
Policy Optimization; traditional training approach using a large teacher model to critique data.

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
Latent Space
An older reinforcement learning algorithm, mentioned as the basis for RHF and contrasted with GRPO in a discussion about memory efficiency and gradient syncing.