T5
A base model of 250 million parameters used in an example to illustrate how model capacity can affect the performance of different distillation methods.
Common Themes
Videos Mentioning T5

Answer.ai & AI Magic with Jeremy Howard
Latent Space
A pre-trained encoder backbone suggested for fine-tuning, part of the discussion on encoder-decoder architectures.

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
An early general-purpose model from Google Brain, mentioned by Yi Tay as an example of the shift towards universal foundation models, even before the general public caught up.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
Mentioned as an example of a model that uses token-based input transformations for multitask learning.

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Latent Space
A text-to-text transfer transformer model, mentioned as one of the models Elicit used in its early development and continues to use.
![[Paper Club] BERT: Bidirectional Encoder Representations from Transformers](https://i.ytimg.com/vi/V64q3p7DNjc/maxresdefault.jpg)
[Paper Club] BERT: Bidirectional Encoder Representations from Transformers
Latent Space
A text-to-text transfer transformer model. Mentioned briefly at the start regarding routing.

A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
A text-to-text transfer transformer model that frames all NLP tasks as text generation.

The Magic of LLM Distillation — Rishabh Agarwal, Google DeepMind
Latent Space
A base model of 250 million parameters used in an example to illustrate how model capacity can affect the performance of different distillation methods.