D
DPO
ConceptMentioned in 1 video
Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.
Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.