Multimodal Diffusion Transformer

Concept

A variant of the Diffusion Transformer that treats input text as a standalone modality, injected directly rather than as an afterthought via modulation.

Mentioned in 2 videos

Save the 2 videos on Multimodal Diffusion Transformer to your own pod.

Get Started Free

Videos Mentioning Multimodal Diffusion Transformer

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training

Stanford Online

A variant of the Diffusion Transformer that treats input text as a standalone modality, injected directly rather than as an afterthought via modulation.

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Stanford Online

An advanced Diffusion Transformer architecture that considers text conditions as part of a joint attention mechanism for improved image generation.