TensorRT
Software / App
Mentioned as an inferencing solution that can be integrated with Llama Stack.
Mentioned in 4 videos
Videos Mentioning TensorRT
![[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval](https://i.ytimg.com/vi/VHwrhL_MSV4/maxresdefault.jpg)
[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval
Latent Space
Mentioned as an example of an inference engine framework.

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
An NVIDIA library used by Replicate for optimizing and deploying deep learning models, particularly for inference.

Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21
Lex Fridman
Nvidia's SDK for high-performance deep learning inference, a hardware-specific compiler integrated with MLIR.

AI Dev 25 | Amit Sangani: Unlock the Power of Open Source with Llama
DeepLearningAI
Mentioned as an inferencing solution that can be integrated with Llama Stack.