llm-cost-optimizer
Installation
SKILL.md
LLM Cost Optimizer
Category: Engineering Domain: AI Cost Management
Overview
The LLM Cost Optimizer skill provides tools for counting tokens, estimating costs across different LLM providers, and optimizing prompts to reduce token usage without sacrificing quality. Essential for teams managing LLM API budgets at scale.
Quick Start
# Count tokens in a prompt file and estimate costs
python scripts/token_counter.py --file prompt.txt --models gpt-4o claude-sonnet
# Count tokens from stdin
echo "Hello world" | python scripts/token_counter.py --stdin --models all
# Analyze a prompt for optimization opportunities
python scripts/prompt_optimizer.py --file system_prompt.txt
# Optimize with target reduction
python scripts/prompt_optimizer.py --file prompt.txt --target-reduction 30
Tools Overview
| Tool | Purpose | Key Flags |
|---|---|---|
token_counter.py |
Count tokens and estimate costs across models | --file, --text, --stdin, --models |
prompt_optimizer.py |
Analyze prompts for token reduction opportunities | --file, --target-reduction, --format |
Workflows
Cost Estimation for New Project
- Collect sample prompts (system prompt + user messages)
- Run
token_counter.pywith target models - Multiply per-request cost by expected daily volume
- Compare models on cost-quality tradeoff
Prompt Optimization Sprint
- Identify highest-cost prompts from usage logs
- Run
prompt_optimizer.pyon each - Apply suggested optimizations
- Re-count tokens to verify reduction
- A/B test optimized vs. original for quality
Reference Documentation
- LLM Pricing Guide - Current pricing for major LLM providers, token estimation methods
Common Patterns
Token Reduction Techniques
- Remove redundant instructions and examples
- Use shorter variable names in few-shot examples
- Compress verbose system prompts
- Replace repeated context with references
- Use structured output formats (JSON) to reduce response tokens
- Batch multiple requests into single prompts where possible
Cost-Effective Model Selection
- Use smaller models for classification/extraction tasks
- Reserve large models for complex reasoning
- Implement model routing based on query complexity
- Cache responses for identical or similar queries