MMDIT head

ConceptMentioned in 1 video

A multimodal dictionary transformer head that improves feature mixing between vision and action features in VLA models, leading to significant performance boosts.

Videos Mentioning MMDIT head

Stanford Robotics Seminar ENGR319 | Winter 2026 | 𝚿0: An Open Foundation Model

Stanford Online

A multimodal dictionary transformer head that improves feature mixing between vision and action features in VLA models, leading to significant performance boosts.