vLLM
Software / App
A popular LLM inference engine, often compared to SGLang and Triton, known for its performance but sometimes criticized for potential code messiness and difficulty in extending.
Mentioned in 2 videos
Videos Mentioning vLLM

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
A popular LLM inference engine, often compared to SGLang and Triton, known for its performance but sometimes criticized for potential code messiness and difficulty in extending.

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
An inference server used by Replicate for serving language models, contributing to optimized performance.