TensorRT-LLM
Software / App
A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.
Mentioned in 1 video
A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.