RedPajama V2

BookMentioned in 1 video

An updated version of the RedPajama dataset with 30 trillion tokens and an emphasis on data quality through modular filtering.