C
Common Crawl / C4
Tool / ProductMentioned in 1 video
Large web-crawl data sources often used in data mixtures for LLM pretraining; discussed in the training-data section.
Large web-crawl data sources often used in data mixtures for LLM pretraining; discussed in the training-data section.