vllm-bench-serve

Installation

SKILL.md

vLLM Bench Serve

Benchmark vLLM or any OpenAI-compatible serving endpoint using the vllm bench serve CLI. Measures throughput, latency (TTFT, TPOT), and goodput against configurable request load.

Reference: vLLM Bench Serve Documentation

Prerequisites

vLLM installed (or any OpenAI-compatible server running)
A vLLM server or API endpoint already serving a model
Python environment with vLLM for the benchmark client

Quick Start

Basic benchmark against local vLLM server (default random dataset, 1000 prompts):

vllm bench serve \
  --backend openai-chat \

Related skills

More from vllm-project/vllm-skills

vllm-deploy-docker
Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server.
70
vllm-deploy-simple
Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.
51
vllm-deploy-k8s
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
40
vllm-bench-random-synthetic
Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.
38
vllm-prefix-cache-bench
This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.
37

Installs

Repository

vllm-project/vllm-skills

GitHub Stars

First Seen

Apr 14, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

vllm-bench-serve

vLLM Bench Serve

Prerequisites

Quick Start

More from vllm-project/vllm-skills

vllm-deploy-docker

vllm-deploy-simple

vllm-deploy-k8s

vllm-bench-random-synthetic

vllm-prefix-cache-bench