vllm-studio-backend
SKILL.md
vLLM Studio Backend Architecture
Overview
This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.
When To Use
- Modifying controller routes or run streaming.
- Debugging OpenAI-compatible endpoint behavior.
- Updating Pi-mono agent runtime or tool execution.
- Understanding how inference + LiteLLM fit together.
Quick Start
- Read
references/backend-architecture.mdfor the component map and data flow. - Read
references/openai-compat.mdfor/v1/modelsand/v1/chat/completionsbehavior. - Read
references/backend-commands.mdfor useful run/debug commands.
Core Guarantees
- Keep OpenAI-compatible endpoints stable (
/v1/models,/v1/chat/completions). /chatUI uses controller run stream (/chats/:id/turn) and Pi-mono runtime.- Tool execution happens server-side (MCP + AgentFS + plan tools).
References
references/backend-architecture.mdreferences/openai-compat.mdreferences/backend-commands.md
Weekly Installs
8
Repository
0xsero/vllm-studioGitHub Stars
284
First Seen
Feb 10, 2026
Security Audits
Installed on
opencode7
gemini-cli7
claude-code7
github-copilot7
kilo6
qwen-code6