LLaVA

Software / App

A family of vision-language models that inject image embeddings into a language model; LLaVA 1.5 can handle multiple images and videos.

Mentioned in 1 video