R

RHF (Reinforcement Learning from Human Feedback)

Study / Research

Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.

Mentioned in 1 video