SGD

Software / App

gradient descent method used for the minimization of an objective function

Mentioned in 5 videos

Save the 5 videos on SGD to your own pod.

Get Started Free

Videos Mentioning SGD

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

Stochastic gradient descent (baseline optimizer) used in examples; compared to Adam/RMSprop when discussing practical stability.

Breaking down the OG GPT Paper by Alec Radford

Latent Space

Mentioned as an example of an optimizer relevant to word embeddings.

MIT 6.S094: Recurrent Neural Networks for Steering Through Time

Lex Fridman

Stochastic Gradient Descent, a common optimization algorithm for training neural networks, discussed as the vanilla approach that can find solutions despite non-convex landscapes.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 2: PyTorch (einops)

Stanford Online

Stochastic Gradient Descent, mentioned as a foundational optimization algorithm from which Adagrad and Adam evolved.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 9: Scaling Laws

Stanford Online

Stochastic Gradient Descent, an optimization algorithm discussed in the context of scaling laws and compared against Adam, highlighting similar scaling trends despite different methodologies.