Megatron

Software / App

Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.

Mentioned in 5 videos

Save the 5 videos on Megatron to your own pod.

Get Started Free

Videos Mentioning Megatron

[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)

Latent Space

Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.

[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu

Latent Space

A topic for the next Paper Club meeting, potentially related to distillation or Mixture of Experts (MoE) models, to be presented by Ethan from Nvidia.

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Latent Space

A model developed in collaboration between Microsoft and NVIDIA, used in an early implementation of Sydney.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism

Stanford Online

NVIDIA's parallelism library and associated guidance document, used as a reference for best practices in parallelism strategies.

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford Online

A library for large-scale deep learning model training, mentioned in the context of combining parallelism and its implementation.