Data Parallelism
Concept
A distributed training technique where the model is replicated across multiple devices, and data is split.
Mentioned in 2 videos
Videos Mentioning Data Parallelism

A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
A distributed training technique where the model is replicated across multiple devices, and data is split.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 7: Parallelism
Stanford Online
A distributed training technique where the model's parameters are replicated across multiple devices, and the data is sharded among them.