Direct Preference Optimization

Concept

An RL-free approach that directly trains models to prefer human outputs without requiring a separate reward model.

Mentioned in 1 video

Videos Mentioning Direct Preference Optimization

Stanford Online

An RL-free approach that directly trains models to prefer human outputs without requiring a separate reward model.