DDP
Software / App
Naive Data Parallelism, used as a baseline for comparison with Zero stages, characterized by replicating model copies across GPUs.
Mentioned in 2 videos
Save the 2 videos on DDP to your own pod.
Sign up free to keep building your knowledge base on DDP as more episodes are added.
Videos Mentioning DDP

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism
Stanford Online
Naive Data Parallelism, used as a baseline for comparison with Zero stages, characterized by replicating model copies across GPUs.

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs
Stanford Online
Distributed Data Parallelism, a class in PyTorch used for data parallelism that automatically handles gradient synchronization.