optimize-llm
Optimize LLM Command
Get quick, actionable recommendations for LLM serving optimization.
Usage
/sd:optimize-llm [focus]
Arguments
focus(optional): Optimization prioritylatency- Focus on reducing response timecost- Focus on reducing inference coststhroughput- Focus on maximizing requests/second- If omitted: Provide balanced recommendations
Examples
/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost
Workflow
-
Gather Context
- Search for LLM-related configuration files
- Look for: model configs, serving configs, inference scripts
- Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)
-
Spawn LLM Optimization Advisor Agent Use the
llm-optimization-advisoragent to analyze and provide recommendations. The agent specializes in:- Quantization strategies (INT8, INT4, FP16)
- Batching optimization (continuous, dynamic)
- KV cache optimization (PagedAttention)
- Serving framework selection
- Cost reduction strategies
-
Present Recommendations Display optimization opportunities organized by:
- Quick Wins - Low effort, high impact changes
- Medium Effort - Moderate changes with significant benefits
- Advanced - Architectural changes for maximum performance
Output Format
## LLM Optimization Report
### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]
### Quick Wins
1. [Optimization] - [Expected impact]
2. ...
### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase
More from melodic-software/claude-code-plugins
design-thinking
Design Thinking methodology for human-centered innovation. Covers the 5-phase IDEO/Stanford d.school approach (Empathize, Define, Ideate, Prototype, Test) with workshop facilitation and exercise templates.
191plantuml-syntax
Authoritative reference for PlantUML diagram syntax. Provides UML and non-UML diagram types, syntax patterns, examples, and setup guidance for generating accurate PlantUML diagrams.
161system-prompt-engineering
Design effective system prompts for custom agents. Use when creating agent system prompts, defining agent identity and rules, or designing high-impact prompts that shape agent behavior.
141resume-optimization
Resume structure, achievement bullet formulas, ATS optimization, and job-targeted tailoring for software engineers. Use when reviewing resumes, crafting achievement bullets, extracting keywords from job descriptions, or tailoring content for specific roles.
93state-machine-design
Statechart and state machine modeling for lifecycle and behavior specification
90modular-architecture
Module organization patterns including ports and adapters (hexagonal), module communication, and data isolation. Use when structuring modular monoliths, defining module boundaries, setting up inter-module communication, or isolating database contexts. Includes MediatR patterns for internal events.
77