serving-llms-vllm

Originally fromovachiever/droid-tings
Installation
SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)
Installs
89
GitHub Stars
9.4K
First Seen
Jan 21, 2026
serving-llms-vllm — zechenzhangagi/ai-research-skills