Radix Attention

ConceptMentioned in 1 video

A prefix caching technology supported by SGLang, which uses a block size of 1 for potentially higher cache hit rates compared to frameworks using larger block sizes like 32.