over refusal
Concept
An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.
Mentioned in 1 video
An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.