speculative decoding

Concept

A technique used to make language model generation faster by having a smaller model predict draft tokens that a larger model then verifies. Cursor uses 'speculative edits' as a variant.

Mentioned in 3 videos

Videos Mentioning speculative decoding

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)

Latent Space

A technique for speeding up inference by using a draft model to predict tokens, which are then verified by the larger target model. Support for variations exists in SGLang and other frameworks.

Why Compound AI + Open Source will beat Closed AI — with Lin Qiao, CEO of Fireworks AI

Latent Space

A technique used by Fireworks AI to improve inference speed, particularly mentioned in relation to achieving 1000 tokens per second and its implementation within the Fire Optimizer.

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447

Lex Fridman

A technique used to make language model generation faster by having a smaller model predict draft tokens that a larger model then verifies. Cursor uses 'speculative edits' as a variant.