Dolma

Software / App

An AI2 dataset that includes processed Common Crawl, Stack Exchange, C4, and other sources, utilizing model-based filtering for quality.

Mentioned in 1 video