VM

Software / App

An inference engine mentioned as having an experimental feature for chunked prefill, and using paged attention for KV cache allocation.

Mentioned in 1 video