high-performance-inference

Installation

SKILL.md

High-Performance Inference

Optimize LLM inference for production with vLLM 0.14.x, quantization, and speculative decoding.

vLLM 0.14.0 (Jan ): PyTorch 2.9.0, CUDA 12.9, AttentionConfig API, Python 3.12+ recommended.

Installs

Repository

GitHub Stars

182

First Seen

Jan 22, 2026

Security Audits

high-performance-inference — yonatangross/orchestkit