serving-llms-vllm

Installation
SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)
Installs
43
GitHub Stars
44
First Seen
Jan 20, 2026
serving-llms-vllm — ovachiever/droid-tings