Speculative Sampling
Concept
A lossless method for speeding up inference by using a cheaper draft model to generate multiple tokens, which are then verified by a larger target model in parallel.
Mentioned in 1 video
A lossless method for speeding up inference by using a cheaper draft model to generate multiple tokens, which are then verified by a larger target model in parallel.