S
SAPE
ConceptMentioned in 1 video
A technique in interpretability that decomposes a model's representation into interpretable primitives, like concepts firing for specific entities.
A technique in interpretability that decomposes a model's representation into interpretable primitives, like concepts firing for specific entities.