Deliberative Alignment
Concept
A paper released by OpenAI discussing how reasoning techniques are used models to refuse harmful requests without over-refusing benign ones, a key aspect of AI safety.
Mentioned in 1 video
A paper released by OpenAI discussing how reasoning techniques are used models to refuse harmful requests without over-refusing benign ones, a key aspect of AI safety.