Yarn scaling
Concept
A technique applied during pre-training in GPT OSS to achieve a 131,000 token context window by scaling rotary positional embeddings.
Mentioned in 1 video
A technique applied during pre-training in GPT OSS to achieve a 131,000 token context window by scaling rotary positional embeddings.