Sparse Autoencoders
Software / AppMentioned in 1 video
A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.
A technique used in mechanistic interpretability to identify features within a language model by reconstructing MLP layer outputs with sparsity and expansion factors.