T

TensorRT-LLM

Software / AppMentioned in 1 video

A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.