Direct Preference Optimization (DPO)

Concept

A clever RLHF-related technique used for aligning models with human preferences.

Mentioned in 1 video