strong to weak distillation

Concept

A technique used by Qwen's developers for training smaller models from larger ones.

Mentioned in 1 video