Megatron
Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.
Save the 5 videos on Megatron to your own pod.
Sign up free to keep building your knowledge base on Megatron as more episodes are added.
Videos Mentioning Megatron
![[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)](https://i.ytimg.com/vi/eNKe04apEaE/maxresdefault.jpg)
[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)
Latent Space
Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.
![[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu](https://i.ytimg.com/vi/ULcwHlxfSkQ/maxresdefault.jpg)
[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu
Latent Space
A topic for the next Paper Club meeting, potentially related to distillation or Mixture of Experts (MoE) models, to be presented by Ethan from Nvidia.

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify
Latent Space
A model developed in collaboration between Microsoft and NVIDIA, used in an early implementation of Sydney.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism
Stanford Online
NVIDIA's parallelism library and associated guidance document, used as a reference for best practices in parallelism strategies.

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs
Stanford Online
A library for large-scale deep learning model training, mentioned in the context of combining parallelism and its implementation.