C4 dataset
Concept
Colossal Clean Crawled Corpus, a large dataset derived from Common Crawl, often used for pre-training LLMs.
Mentioned in 1 video
Colossal Clean Crawled Corpus, a large dataset derived from Common Crawl, often used for pre-training LLMs.