vLLM
Software / App
A popular LLM inference engine, often compared to SGLang and Triton, known for its performance but sometimes criticized for potential code messiness and difficulty in extending.
Mentioned in 2 videos
Save the 2 videos on vLLM to your own pod.
Sign up free to keep building your knowledge base on vLLM as more episodes are added.
Videos Mentioning vLLM

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
A popular LLM inference engine, often compared to SGLang and Triton, known for its performance but sometimes criticized for potential code messiness and difficulty in extending.

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
An inference server used by Replicate for serving language models, contributing to optimized performance.