LLM Cost Optimizer

Category: Engineering Domain: AI Cost Management

Overview

The LLM Cost Optimizer skill provides tools for counting tokens, estimating costs across different LLM providers, and optimizing prompts to reduce token usage without sacrificing quality. Essential for teams managing LLM API budgets at scale.

Quick Start

# Count tokens in a prompt file and estimate costs
python scripts/token_counter.py --file prompt.txt --models gpt-4o claude-sonnet

# Count tokens from stdin
echo "Hello world" | python scripts/token_counter.py --stdin --models all

# Analyze a prompt for optimization opportunities
python scripts/prompt_optimizer.py --file system_prompt.txt

# Optimize with target reduction
python scripts/prompt_optimizer.py --file prompt.txt --target-reduction 30

Tools Overview

Tool	Purpose	Key Flags
`token_counter.py`	Count tokens and estimate costs across models	`--file`, `--text`, `--stdin`, `--models`
`prompt_optimizer.py`	Analyze prompts for token reduction opportunities	`--file`, `--target-reduction`, `--format`

Workflows

Cost Estimation for New Project

Collect sample prompts (system prompt + user messages)
Run token_counter.py with target models
Multiply per-request cost by expected daily volume
Compare models on cost-quality tradeoff

Prompt Optimization Sprint

Identify highest-cost prompts from usage logs
Run prompt_optimizer.py on each
Apply suggested optimizations
Re-count tokens to verify reduction
A/B test optimized vs. original for quality

Reference Documentation

LLM Pricing Guide - Current pricing for major LLM providers, token estimation methods

Common Patterns

Token Reduction Techniques

Remove redundant instructions and examples
Use shorter variable names in few-shot examples
Compress verbose system prompts
Replace repeated context with references
Use structured output formats (JSON) to reduce response tokens
Batch multiple requests into single prompts where possible

Cost-Effective Model Selection

Use smaller models for classification/extraction tasks
Reserve large models for complex reasoning
Implement model routing based on query complexity
Cache responses for identical or similar queries

llm-cost-optimizer