Multimodal Diffusion Transformer

Concept

A variant of the Diffusion Transformer that treats input text as a standalone modality, injected directly rather than as an afterthought via modulation.

Mentioned in 1 video