Dropout
Regularization technique used in the scaled-up model to reduce overfitting (randomly disables activations during training).
Videos Mentioning Dropout

Let's build GPT: from scratch, in code, spelled out.
Andrej Karpathy
Regularization technique used in the scaled-up model to reduce overfitting (randomly disables activations during training).

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
Lex Fridman
A regularization technique for neural networks that prevents overfitting, cited as a breakthrough not requiring multiple GPUs.

Deep Learning Basics: Introduction and Overview
Lex Fridman
A regularization technique where random neurons are ignored during training to prevent overfitting, mentioned as an idea used in AlexNet.

Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)
Lex Fridman
A regularization technique that randomly sets a proportion of neurons to zero during training to prevent overfitting.

Foundations of Deep Learning (Hugo Larochelle, Twitter)
Lex Fridman
A regularization technique where hidden units are randomly ignored during training to prevent co-adaptation and overfitting.