CCNet

Software / App

Developed by Facebook, used for creating high-quality datasets, particularly for low-resource languages, by employing deduplication and language identification, and using a language model to score document quality based on Wikipedia-likeness.

Mentioned in 1 video