The Pile
ConceptMentioned in 2 videos
A common training corpus for AI models, used in combination with The Stack.
Videos Mentioning The Pile

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
Latent Space
A common training corpus for AI models, used in combination with The Stack.

FlashAttention-2: Making Transformers 800% faster AND exact
Latent Space
A large, diverse open-source dataset for language modeling, widely used.