sycophancy
ConceptMentioned in 1 video
A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.
A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.