SGD
Software / App
Stochastic gradient descent (baseline optimizer) used in examples; compared to Adam/RMSprop when discussing practical stability.
Mentioned in 3 videos
Videos Mentioning SGD

Building makemore Part 3: Activations & Gradients, BatchNorm
Andrej Karpathy
Stochastic gradient descent (baseline optimizer) used in examples; compared to Adam/RMSprop when discussing practical stability.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
Mentioned as an example of an optimizer relevant to word embeddings.

MIT 6.S094: Recurrent Neural Networks for Steering Through Time
Lex Fridman
Stochastic Gradient Descent, a common optimization algorithm for training neural networks, discussed as the vanilla approach that can find solutions despite non-convex landscapes.