D
Deliberative Alignment
ConceptMentioned in 1 video
A paper released by OpenAI discussing how reasoning techniques are used models to refuse harmful requests without over-refusing benign ones, a key aspect of AI safety.
A paper released by OpenAI discussing how reasoning techniques are used models to refuse harmful requests without over-refusing benign ones, a key aspect of AI safety.