Sparse Autoencoders
Software / App
A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.
Mentioned in 1 video
A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.