D

DPO

ConceptMentioned in 1 video

Direct Preference Optimization, an RL algorithm mentioned in the context of its complexity and comparison to GRPO.