runpod-deployment
Installation
SKILL.md
- Serverless Workers - Scale-to-zero handlers with pay-per-second billing
- vLLM Endpoints - OpenAI-compatible LLM serving with 2-3x throughput
- Pod Management - Dedicated GPU instances for development/training
- Cost Optimization - GPU selection, spot instances, budget controls
Key deliverables:
- Production-ready serverless handlers with streaming
- vLLM deployment with OpenAI API compatibility
- Cost-optimized GPU selection for any model size
- Health monitoring and auto-scaling configuration
<quick_start> Minimal Serverless Handler (v1.8.1):
import runpod