RLHF

Concept

Reinforcement Learning from Human Feedback, Paul Cristiano is identified as its inventor.

Mentioned in 1 video