T5

Software / AppMentioned in 1 video

A base model of 250 million parameters used in an example to illustrate how model capacity can affect the performance of different distillation methods.