RedPajama V2
Book
An updated version of the RedPajama dataset with 30 trillion tokens and an emphasis on data quality through modular filtering.
Mentioned in 1 video
An updated version of the RedPajama dataset with 30 trillion tokens and an emphasis on data quality through modular filtering.