D

Direct Preference Optimization (DPO)

Tool / Product

A clever RLHF-related technique used for aligning models with human preferences.

Mentioned in 1 video