llm-inference-batching-scheduler
Installation
SKILL.md
LLM Inference Batching Scheduler
This skill provides guidance for solving LLM inference batching and scheduling optimization problems, where requests must be grouped into batches while minimizing cost, padding waste, and latency.
Problem Understanding
Before implementation, thoroughly analyze the problem structure:
Constraint Analysis
- Identify all hard constraints - Extract exact limits for:
- Maximum unique shapes allowed (e.g., ≤ 8 shapes across all buckets)
- Latency thresholds (P95, P99)
- Cost budget thresholds
- Padding ratio limits