Adaptive Layer Normalization

Concept

A technique used in Diffusion Transformers to inject conditions and time steps as inputs, modulating token embeddings via gate shift and scale factors.

Mentioned in 1 video