sycophancy

ConceptMentioned in 1 video

A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.