vLLM
Software / App
open-source software for large language model inference
Mentioned in 4 videos
Save the 4 videos on vLLM to your own pod.
Sign up free to keep building your knowledge base on vLLM as more episodes are added.
Videos Mentioning vLLM

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
A popular LLM inference engine, often compared to SGLang and Triton, known for its performance but sometimes criticized for potential code messiness and difficulty in extending.

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
An inference server used by Replicate for serving language models, contributing to optimized performance.

⚡️ Google's Open AI Strategy — Omar Sanseviero, Google DeepMind
Latent Space
An open-source partner collaborating with Google to enable the use of Gemma models, including integration with Android Studio.

Stanford CS25: Transformers United V6 I Serving Transformers: Lessons from the Trenches
Stanford Online
An open-source inference engine with wide adoption and enterprise flavor.