A large dataset of internet data used to train models like GloVe, containing billions of tokens.
Lex Fridman