TensorRT-LLM

Software / App

A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.

Mentioned in 1 video