Mobile LM
Study / ResearchMentioned in 1 video
A paper by Meta that studies models under 1 billion parameters, finding depth is more important than width and that GQA helps.
A paper by Meta that studies models under 1 billion parameters, finding depth is more important than width and that GQA helps.