multi-tenant-llm-hosting
SKILL.md
Multi-Tenant LLM Hosting
Host many teams/customers on shared inference infrastructure without sacrificing security, performance, or cost governance.
Isolation Model
- Strong tenant identity on every request
- Per-tenant API keys and scoped model access
- Namespace or workload isolation for high-risk tenants
- Strict data retention and log partitioning controls
Noisy-Neighbor Controls
- Per-tenant RPM/TPM limits
- Concurrency caps and queue isolation
- Fair scheduling with weighted priority classes
- Backpressure and graceful degradation policies
Billing and Chargeback
Track per-tenant:
- prompt/completion/cached tokens,
- model type and route,
- latency and success rate,
- cost with markup or internal transfer pricing.
Security Baseline
- Encrypt data in transit and at rest.
- Disallow cross-tenant cache leakage.
- Restrict debug data access by role.
- Audit all privileged administrative actions.
Operational Runbook
- Onboard tenant with policy template.
- Issue virtual key and quota profile.
- Validate observability and billing tags.
- Run tenant-specific load/safety tests.
- Enable production traffic with canary limits.
Related Skills
- llm-gateway - Key management and traffic routing
- llm-cost-optimization - Cost controls and optimization tactics
- zero-trust - Identity-centric network and access patterns
Weekly Installs
2
Repository
bagelhole/devop…t-skillsGitHub Stars
13
First Seen
5 days ago
Security Audits
Installed on
opencode2
antigravity2
claude-code2
github-copilot2
codex2
zencoder2