Dolma
Software / App
An AI2 dataset that includes processed Common Crawl, Stack Exchange, C4, and other sources, utilizing model-based filtering for quality.
Mentioned in 1 video
An AI2 dataset that includes processed Common Crawl, Stack Exchange, C4, and other sources, utilizing model-based filtering for quality.