over refusal

Concept

An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.

Mentioned in 1 video