Mobile LM

Study / ResearchMentioned in 1 video

A paper by Meta that studies models under 1 billion parameters, finding depth is more important than width and that GQA helps.