CLIP
An open-sourced model that was combined with GANs by early Replicate users to create text-to-image models.
Common Themes
Videos Mentioning CLIP

The Agent Reasoning Interface: Claude, ChatGPT Canvas, Tasks, Operator — with Karina Nguyen, OpenAI
Latent Space
Karina used CLIP for fashion recommendation search in early prototypes before joining Anthropic.

AI Engineering for Art - with comfyanonymous
Latent Space
The text encoder commonly used in Stable Diffusion models, which processes prompt tokens into vectors.

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor
Latent Space
Used in conjunction with GPT-3 embeddings and Fe for a similarity search system at Stitch Fix.
![Best of 2024 in Vision [LS Live @ NeurIPS]](https://i.ytimg.com/vi/76EL7YVAwVo/maxresdefault.jpg)
Best of 2024 in Vision [LS Live @ NeurIPS]
Latent Space
A model used as a vision encoder, hypothesized as a reason why LLMs struggle with fine-grained visual details due to its contrastive training objective.
![[Paper Club] Embeddings in 2024: OpenAI, Nomic Embed, Jina Embed, cde-small-v1 - with swyx](https://i.ytimg.com/vi/VIqXNRsRRQo/maxresdefault.jpg)
[Paper Club] Embeddings in 2024: OpenAI, Nomic Embed, Jina Embed, cde-small-v1 - with swyx
Latent Space
A multimodal model that integrates vision and text embeddings. The speaker highlights its utility and provides qualitative examples comparing its performance to OpenAI's Clip.
![[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz](https://i.ytimg.com/vi/8BN9CdIYaqc/maxresdefault.jpg)
[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz
Latent Space
A multimodal model developed by OpenAI, trained on a massive dataset of images and text. Used as a vision encoder in some models, with its training data being proprietary.

Wojciech Zaremba: OpenAI Codex, GPT-3, Robotics, and the Future of AI | Lex Fridman Podcast #215
Lex Fridman
An OpenAI model that can identify images based on textual descriptions, noted for its power, though less immediately obvious than DALL-E.

Ben Firshman CEO of Replicate on Building Community, Open Source, and Navigating the AI Industry
AssemblyAI
An open-sourced model that was combined with GANs by early Replicate users to create text-to-image models.