Direct Preference Optimization (DPO)

1 video summary