DeepSpeed
Software / App
Mentioned as a system from Microsoft that uses data parallelism, contrasted with Cerebras' weight streaming approach.
Mentioned in 3 videos
Videos Mentioning DeepSpeed

State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
An open-source optimization library used to accelerate large-scale model training, providing helpful working examples for tuning.
![[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)](https://i.ytimg.com/vi/eNKe04apEaE/maxresdefault.jpg)
[Paper Club] Weight Streaming on Wafer-Scale Clusters (w/ Sarah Chieng of Cerebras)
Latent Space
Mentioned as a system from Microsoft that uses data parallelism, contrasted with Cerebras' weight streaming approach.
![[Paper Club] Upcycling Large Language Models into Mixture of Experts](https://i.ytimg.com/vi/e_mkhFkKPEk/maxresdefault.jpg)
[Paper Club] Upcycling Large Language Models into Mixture of Experts
Latent Space
A deep learning optimization library that has a MoE variant (DeepSpeed-MoE V2) with a large number of experts.