D
Direct Preference Optimization (DPO)
Tool / ProductA clever RLHF-related technique used for aligning models with human preferences.
Mentioned in 1 video
A clever RLHF-related technique used for aligning models with human preferences.
Mentioned in 1 video