Q
Qwen 3
Software / AppMentioned in 1 video
An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.
An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.