C

CL100k (GPT-4 tokenizer)

Tool / ProductMentioned in 1 video

The tokenizer used for GPT-4 (a ~100k token vocabulary). The video compares it to GPT-2 tokenizer and shows improvements in whitespace/coding efficiency.