Reinforcement Learning from Human Feedback (RLHF)
Concept
A training technique used to align AI models with human preferences by using human feedback to train a reward model, which then guides the AI's policy.
Mentioned in 1 video
A training technique used to align AI models with human preferences by using human feedback to train a reward model, which then guides the AI's policy.