tensorrt-llm

Originally fromovachiever/droid-tings

Installation

SKILL.md

TensorRT-LLM

NVIDIA's open-source library for optimizing LLM inference with state-of-the-art performance on NVIDIA GPUs.

When to use TensorRT-LLM

Use TensorRT-LLM when:

Deploying on NVIDIA GPUs (A100, H100, GB200)
Need maximum throughput (24,000+ tokens/sec on Llama 3)
Require low latency for real-time applications
Working with quantized models (FP8, INT4, FP4)
Scaling across multiple GPUs or nodes

Use vLLM instead when:

Need simpler setup and Python-first API
Want PagedAttention without TensorRT compilation
Working with AMD GPUs or non-NVIDIA hardware

Installs

88

Repository

zechenzhangagi/…h-skills

GitHub Stars

9.4K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass

tensorrt-llm — zechenzhangagi/ai-research-skills