FineWeb-Edu
Software / App
A dataset created by filtering FineWeb for highly educational content using LLaMA 3 for annotations and a classifier.
Mentioned in 2 videos
Save the 2 videos on FineWeb-Edu to your own pod.
Sign up free to keep building your knowledge base on FineWeb-Edu as more episodes are added.
Videos Mentioning FineWeb-Edu
![Best of 2024: Synthetic Data / Smol Models, Loubna Ben Allal, HuggingFace [LS Live! @ NeurIPS 2024]](https://i.ytimg.com/vi/AjmdDy7Rzx0/maxresdefault.jpg)
Best of 2024: Synthetic Data / Smol Models, Loubna Ben Allal, HuggingFace [LS Live! @ NeurIPS 2024]
Latent Space
A dataset created by filtering FineWeb for highly educational content using LLaMA 3 for annotations and a classifier.

Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence
Stanford Online
A popular approach for classifying web crawl data quality based on educational content.