F

FineWeb

Tool / ProductMentioned in 2 videos

Dataset curated by Hugging Face used as an example pretraining corpus (filtered, ~44 TB).