speculative decoding

Concept

A technique used to make language model generation faster by having a smaller model predict draft tokens that a larger model then verifies. Cursor uses 'speculative edits' as a variant.

Mentioned in 3 videos