Radix Attention
ConceptMentioned in 1 video
A prefix caching technology supported by SGLang, which uses a block size of 1 for potentially higher cache hit rates compared to frameworks using larger block sizes like 32.
A prefix caching technology supported by SGLang, which uses a block size of 1 for potentially higher cache hit rates compared to frameworks using larger block sizes like 32.