Qwen 3

Software / App

An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.

Mentioned in 5 videos

Save the 5 videos on Qwen 3 to your own pod.

Get Started Free

Videos Mentioning Qwen 3

⚡ Open Model Pretraining Masterclass — Elie Bakouch, HuggingFace SmolLM 3, FineWeb, FinePDF

Latent Space

An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.

The Utility of Interpretability — Emmanuel Amiesen

Latent Space

An AI model, suggested by Vivu for an experiment to prove whether chain-of-thought faithfulness behavior in models is present in base models or only post-finetuning, offering a $100 bet by Emanuel.

OpenAI vs. Deepseek vs. Qwen: Comparing Open Source LLM Architectures

Y Combinator

A family of models developed by Alibaba Cloud, featuring both dense and mixture of expert variants, with benchmark scores rivaling leading open source models.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 4: Attention Alternatives

Stanford Online

A model that NeMo Tron 3 is compared against, showing competitive performance.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 13: Data (Sources, Datasets)

Stanford Online

Trained on 36 trillion tokens, referenced for scale comparison with Nematron and other datasets.