RefinedWeb

Software / App

A dataset created by asserting that web data alone is sufficient for training. Its loosely filtered version was used as negative examples in DCLAM.

Mentioned in 1 video