D

Deliberative Alignment

ConceptMentioned in 1 video

A paper released by OpenAI discussing how reasoning techniques are used models to refuse harmful requests without over-refusing benign ones, a key aspect of AI safety.