SAPE

Concept

A technique in interpretability that decomposes a model's representation into interpretable primitives, like concepts firing for specific entities.

Mentioned in 1 video

Videos Mentioning SAPE

Latent Space

A technique in interpretability that decomposes a model's representation into interpretable primitives, like concepts firing for specific entities.