Mechanistic Interpretability

Concept

A field of study that involves interpreting the internal workings and decision-making processes of AI models.

Mentioned in 2 videos