MMEDIT head

ConceptMentioned in 1 video

A proposed innovation for the action expert training in Vision-Language Action models, improving feature mixing between visual and action features.