DCLM
Study / Research
A data curation project that showed human experts could not predict the filtering criteria of their automated system, highlighting the limitations of human judgment in data curation.
Mentioned in 2 videos
Videos Mentioning DCLM
![Best of 2024: Synthetic Data / Smol Models, Loubna Ben Allal, HuggingFace [LS Live! @ NeurIPS 2024]](https://i.ytimg.com/vi/AjmdDy7Rzx0/maxresdefault.jpg)
Best of 2024: Synthetic Data / Smol Models, Loubna Ben Allal, HuggingFace [LS Live! @ NeurIPS 2024]
Latent Space
A dataset created by training a classifier on OpenHermes data for high-quality, information-dense LLM training.

Better Data is All You Need — Ari Morcos, Datology
Latent Space
A data curation project that showed human experts could not predict the filtering criteria of their automated system, highlighting the limitations of human judgment in data curation.