C
CL100k (GPT-4 tokenizer)
Tool / ProductMentioned in 1 video
The tokenizer used for GPT-4 (a ~100k token vocabulary). The video compares it to GPT-2 tokenizer and shows improvements in whitespace/coding efficiency.
The tokenizer used for GPT-4 (a ~100k token vocabulary). The video compares it to GPT-2 tokenizer and shows improvements in whitespace/coding efficiency.