Mobile LM

Study / Research

A paper by Meta that studies models under 1 billion parameters, finding depth is more important than width and that GQA helps.

Mentioned in 1 video