Self-distillation

Concept

A form of distillation where a model distills knowledge into another model of the same size, surprisingly leading to further loss improvement.

Mentioned in 1 video