high-performance-inference

Installation

SKILL.md

High-Performance Inference

Optimize LLM inference for production with vLLM 0.14.x, quantization, and speculative decoding.

vLLM 0.14.0 (Jan 2026): PyTorch 2.9.0, CUDA 12.9, AttentionConfig API, Python 3.12+ recommended.

Installs

Repository

GitHub Stars

188

First Seen

Jan 21, 2026

high-performance-inference — yonatangross/skillforge-claude-plugin