Byte Pair Encoding (BPE)

Concept

Tokenization algorithm (merging common byte/byte-pair sequences) used to build large vocabularies (~100k tokens).

Mentioned in 1 video