Q

Qwen 3

Software / AppMentioned in 1 video

An MoE model that was initially slower to train than Mixtral without optimizations, but became feasible with new kernel and optimizations.