Reinforcement Learning from Human Feedback (RLHF)

Concept

A training technique used to align AI models with human preferences by using human feedback to train a reward model, which then guides the AI's policy.

Mentioned in 1 video

Videos Mentioning Reinforcement Learning from Human Feedback (RLHF)

ChatGPT with Rob Miles - Computerphile

Computerphile

A training technique used to align AI models with human preferences by using human feedback to train a reward model, which then guides the AI's policy.