Speculative Sampling

Concept

A lossless method for speeding up inference by using a cheaper draft model to generate multiple tokens, which are then verified by a larger target model in parallel.

Mentioned in 1 video