Dataset used for broad internet-scale pretraining of language models
Mentioned in 1 video
Dwarkesh Clips