Joint Attention

Concept

A method to inject conditions where both image patch embeddings and text embeddings are considered jointly and attended through the same self-attention layer.

Mentioned in 1 video