CCNet
Software / App
Developed by Facebook, used for creating high-quality datasets, particularly for low-resource languages, by employing deduplication and language identification, and using a language model to score document quality based on Wikipedia-likeness.
Mentioned in 1 video
