o

over refusal

ConceptMentioned in 1 video

An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.