Cross Attention

Concept

A method to inject conditions where image patch embeddings (queries) attend to text embeddings (keys and values) to determine relevance for image changes.

Mentioned in 1 video