vllm-studio-backend
Installation
SKILL.md
vLLM Studio Backend Architecture
Overview
This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.
When To Use
- Modifying controller routes or run streaming.
- Debugging OpenAI-compatible endpoint behavior.
- Updating Pi-mono agent runtime or tool execution.
- Understanding how inference + LiteLLM fit together.
Quick Start
- Read
references/backend-architecture.mdfor the component map and data flow. - Read
references/openai-compat.mdfor/v1/modelsand/v1/chat/completionsbehavior. - Read
references/backend-commands.mdfor useful run/debug commands.