The Pile

Software / App

A diverse dataset created by Eleuther AI from various sources including Common Crawl, PubMed, Arxiv, GitHub, and books, aiming for open-source accessibility.

Mentioned in 1 video