Flamingo

Software / App

A DeepMind model that adds vision capabilities to language, built by freezing Chinchilla's weights and adding new visual components, enabling dialogue about images.

Mentioned in 3 videos

Save the 3 videos on Flamingo to your own pod.

Get Started Free

Videos Mentioning Flamingo

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306

Lex Fridman

A DeepMind model that adds vision capabilities to language, built by freezing Chinchilla's weights and adding new visual components, enabling dialogue about images.

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford Online

A multimodal LLM developed by Google that uses cross-attention where images are given as keys and values, allowing text tokens to interact with encoded images.

Why AI Agents Don't Actually Understand You — Danielle Perszyk, Amazon AGI Lab

Latent Space

Mentioned as an example of prior research in voice models with full duplex end-to-end capabilities that predates current industry focus on real-time interaction.