DPO

Concept

Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.

Mentioned in 2 videos