S

Subliminal Learning

ConceptMentioned in 1 video

A phenomenon where models learn hidden biases even when not explicitly trained on biased data, observed through training on distilled data. Discussed as a worrying area where interpretability is needed.