sycophancy
Concept
A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.
Mentioned in 1 video
A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.