RHF (Reinforcement Learning from Human Feedback)
Study / ResearchMentioned in 1 video
Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.
Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.