Vija

Software / App

A video JEPA model that uses temporal masking on consecutive frames to predict unseen areas, employing an EMA encoder and stop gradient for regularization.

Mentioned in 1 video