VM

Software / AppMentioned in 1 video

An inference engine mentioned as having an experimental feature for chunked prefill, and using paged attention for KV cache allocation.