Qwen 3

Software / App

An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.

Mentioned in 3 videos