MMDIT

Software / App

A diffusion model where the input is guided by text, not directly reusable as is for image-to-text evaluation tasks.

Mentioned in 1 video