Megatron
Software / App
Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.
Mentioned in 2 videos
Videos Mentioning Megatron
![[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)](https://i.ytimg.com/vi/eNKe04apEaE/maxresdefault.jpg)
[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)
Latent Space
Mentioned as a system that utilizes data parallelism for training, contrasted with Cerebras' weight streaming approach.
![[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu](https://i.ytimg.com/vi/ULcwHlxfSkQ/maxresdefault.jpg)
[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu
Latent Space
A topic for the next Paper Club meeting, potentially related to distillation or Mixture of Experts (MoE) models, to be presented by Ethan from Nvidia.