speculative decoding
A technique used to make language model generation faster by having a smaller model predict draft tokens that a larger model then verifies. Cursor uses 'speculative edits' as a variant.
Videos Mentioning speculative decoding

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
A technique for speeding up inference by using a draft model to predict tokens, which are then verified by the larger target model. Support for variations exists in SGLang and other frameworks.

Why Compound AI + Open Source will beat Closed AI — with Lin Qiao, CEO of Fireworks AI
Latent Space
A technique used by Fireworks AI to improve inference speed, particularly mentioned in relation to achieving 1000 tokens per second and its implementation within the Fire Optimizer.

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447
Lex Fridman
A technique used to make language model generation faster by having a smaller model predict draft tokens that a larger model then verifies. Cursor uses 'speculative edits' as a variant.