Sparse Autoencoders

Software / App

A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.

Mentioned in 1 video

Videos Mentioning Sparse Autoencoders

LLM Asia Paper Club Survey Round

Latent Space

A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.