R
RHF (Reinforcement Learning from Human Feedback)
Study / ResearchCore training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.
Mentioned in 1 video
Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.
Mentioned in 1 video