Multimodal Diffusion Transformer

Software / App

An architecture coined in the Stable Diffusion 3 paper, relying on joint attention of different modalities for image generation, overcoming limitations of DiT.

Mentioned in 1 video