SGD
Software / App
Stochastic gradient descent (baseline optimizer) used in examples; compared to Adam/RMSprop when discussing practical stability.
Mentioned in 4 videos
Videos Mentioning SGD

Building makemore Part 3: Activations & Gradients, BatchNorm
Andrej Karpathy
Stochastic gradient descent (baseline optimizer) used in examples; compared to Adam/RMSprop when discussing practical stability.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
Mentioned as an example of an optimizer relevant to word embeddings.

MIT 6.S094: Recurrent Neural Networks for Steering Through Time
Lex Fridman
Stochastic Gradient Descent, a common optimization algorithm for training neural networks, discussed as the vanilla approach that can find solutions despite non-convex landscapes.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 2: PyTorch (einops)
Stanford Online
Stochastic Gradient Descent, mentioned as a foundational optimization algorithm from which Adagrad and Adam evolved.