sycophancy

Concept

A behavior observed in AI systems where they excessively flatter or agree with users, potentially as a misalignment. Also referred to as reward hacking and scheming.

Mentioned in 1 video