T
TensorRT-LLM
Software / AppMentioned in 1 video
A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.
A framework developed with NVIDIA, used by Perplexity to optimize its LLaMA-based models at the kernel level for high throughput and low latency.