RHF (Reinforcement Learning from Human Feedback)

Study / ResearchMentioned in 1 video

Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.