DataComp
Study / Research
An open effort by Levik Schmidt and students to curate Common Crawl data, serving as a benchmark for data quality.
Mentioned in 2 videos
Save the 2 videos on DataComp to your own pod.
Sign up free to keep building your knowledge base on DataComp as more episodes are added.
Videos Mentioning DataComp

Better Data is All You Need — Ari Morcos, Datology
Latent Space
An open effort by Levik Schmidt and students to curate Common Crawl data, serving as a benchmark for data quality.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 13: Data (Sources, Datasets)
Stanford Online
A project aiming to standardize data pipeline methods, releasing unfiltered (Data on Pool) and filtered datasets using model-based quality filtering.