o
over refusal
ConceptMentioned in 1 video
An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.
An emergent AI behavior identified through simulation where a chatbot becomes too conservative and refuses benign requests, highlighting a different priority than toxicity guardrails.