llm-inference-batching-scheduler

Installation
SKILL.md

LLM Inference Batching Scheduler

This skill provides guidance for solving LLM inference batching and scheduling optimization problems, where requests must be grouped into batches while minimizing cost, padding waste, and latency.

Problem Understanding

Before implementation, thoroughly analyze the problem structure:

Constraint Analysis

  1. Identify all hard constraints - Extract exact limits for:
    • Maximum unique shapes allowed (e.g., ≤ 8 shapes across all buckets)
    • Latency thresholds (P95, P99)
    • Cost budget thresholds
    • Padding ratio limits
Installs
41
Repository
letta-ai/skills
GitHub Stars
111
First Seen
Jan 24, 2026
llm-inference-batching-scheduler — letta-ai/skills