S

SAPE

ConceptMentioned in 1 video

A technique in interpretability that decomposes a model's representation into interpretable primitives, like concepts firing for specific entities.