RedPajama V2

Book

An updated version of the RedPajama dataset with 30 trillion tokens and an emphasis on data quality through modular filtering.

Mentioned in 1 video