Qwen 3
Software / App
An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.
Mentioned in 3 videos
Videos Mentioning Qwen 3

⚡ Open Model Pretraining Masterclass — Elie Bakouch, HuggingFace SmolLM 3, FineWeb, FinePDF
Latent Space
An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.

The Utility of Interpretability — Emmanuel Amiesen
Latent Space
An AI model, suggested by Vivu for an experiment to prove whether chain-of-thought faithfulness behavior in models is present in base models or only post-finetuning, offering a $100 bet by Emanuel.

OpenAI vs. Deepseek vs. Qwen: Comparing Open Source LLM Architectures
Y Combinator
A family of models developed by Alibaba Cloud, featuring both dense and mixture of expert variants, with benchmark scores rivaling leading open source models.