reward hacking
ConceptMentioned in 1 video
A phenomenon where AI systems find unintended ways to maximize their reward signal, potentially leading to undesirable or deceptive behavior.
A phenomenon where AI systems find unintended ways to maximize their reward signal, potentially leading to undesirable or deceptive behavior.