claude-cost-optimization
Claude Cost Optimization
Overview
Cost optimization is critical for production Claude deployments. A single inefficiently-designed agent can cost hundreds or thousands of dollars monthly, while optimized implementations cost 10-90% less for identical functionality. This skill provides a comprehensive workflow for measuring, analyzing, and optimizing token costs.
Why This Matters:
- Token costs are your largest Claude expense
- Small improvements compound over millions of API calls
- Context optimization alone saves 60-90% on long conversations
- Model + effort selection can reduce costs 5-10x for specific tasks
Key Savings Available:
- Effort parameter: 20-70% token reduction (same model, different reasoning depth)
- Context editing: 60-90% reduction on long conversations
- Tool optimization: 37-85% reduction with advanced tool patterns
- Prompt caching: 90% reduction on repeated content
- Model selection: 2-5x cost difference between models
When to Use This Skill
Use claude-cost-optimization when you need to:
- Track Token Costs: Understand exactly what your Claude implementation costs
- Identify Expensive Patterns: Find which operations consume the most tokens
- Measure ROI: Calculate the business value of your Claude integration
- Optimize for Production: Reduce costs before deploying expensive agents
- Analyze Cost Drivers: Break down costs by model, feature, endpoint, or time period
- Plan Budget: Forecast future costs based on growth projections
- Implement Optimizations: Apply proven techniques (caching, batching, context editing)
- Set Alerts: Monitor costs and get notified of anomalies or budget overruns
5-Step Optimization Workflow
Step 1: Measure Baseline Usage
Establish your current cost baseline before optimization.
What to Measure:
- Total monthly tokens (input + output)
- Cost breakdown by model
- Top 10 most expensive operations
- Average tokens per request
- Peak usage times and patterns
How to Measure (using Admin API):
from anthropic import Anthropic
client = Anthropic()
# Get monthly usage
response = client.beta.admin.usage_metrics.list(
limit=30,
sort_by="date",
)
total_input_tokens = sum(m.input_tokens for m in response.data)
total_output_tokens = sum(m.output_tokens for m in response.data)
total_cost = (total_input_tokens * 0.000005) + (total_output_tokens * 0.000025)
print(f"Monthly cost: ${total_cost:.2f}")
Where to Start: See references/usage-tracking.md for detailed Admin API integration
Step 2: Analyze Cost Drivers
Understand where your costs actually come from.
Identify Expensive Patterns:
- Which operations use the most tokens?
- Which models cost the most?
- Are you using caching effectively?
- Are context windows growing unnecessarily?
- Are you making redundant API calls?
Create Cost Breakdown (example):
Agent reasoning loops: 45% of costs
File analysis: 25% of costs
Web search: 15% of costs
Classification tasks: 10% of costs
Other: 5% of costs
Key Metrics to Calculate:
- Cost per transaction
- Tokens per transaction
- Cost per business outcome
- Cost trend (week-over-week)
Step 3: Apply Optimizations
Apply targeted optimizations to your biggest cost drivers.
Effort Parameter (if using Opus 4.5):
- Complex reasoning: high effort (default)
- Balanced tasks: medium effort (20-40% savings)
- Simple classification: low effort (50-70% savings)
Context Editing (for long conversations):
- Automatic tool result clearing (saves 60-90%)
- Client-side compaction (saves automatic summarization)
- Memory tool integration (enables infinite conversations)
Tool Optimization (for large tool sets):
- Tool search with deferred loading (supports 10K+ tools)
- Programmatic calling (37% token reduction on data processing)
- Tool examples (improve accuracy 72% → 90%)
Prompt Caching (for repeated content):
- Cache system prompts (90% cost reduction on cached portion)
- Cache repeated files/documents
- Cache tool definitions
Model Selection:
- Opus 4.5: $5/M input, $25/M output (complex tasks)
- Sonnet 4.5: (see references for pricing)
- Haiku 4.5: (see references for pricing)
Step 4: Track Improvements
Monitor cost reductions and efficiency gains after optimizations.
Metrics to Track:
- Cost per transaction (before vs after)
- Total token reduction percentage
- Quality impact (did results improve or worsen?)
- Implementation difficulty and time
Measurement Period: Track for 1-2 weeks per optimization to see impact
Example Impact:
Optimization: Client-side compaction on long research tasks
Before: 450K tokens/request, $11.25 cost
After: 180K tokens/request, $4.50 cost
Savings: 60% cost reduction
Step 5: Report ROI
Calculate business value of your optimizations.
ROI Calculation:
Monthly Savings = (Daily Cost × 30) - (Optimized Cost × 30)
Implementation Hours = Time to implement optimizations
Cost per Hour = $100-300 (your eng cost)
Payback Period = (Implementation Hours × Cost per Hour) / Monthly Savings
ROI Example:
Monthly savings: $500/month
Implementation: 8 hours
Cost per hour: $150
Implementation cost: $1,200
Payback period: 2.4 months
First year ROI: 400%
Quick Start - Usage Tracking
Get started with Admin API cost tracking in 5 minutes:
import anthropic
from datetime import datetime, timedelta
client = anthropic.Anthropic()
def get_monthly_costs():
"""Get current month's token costs"""
# Get usage for last 30 days
now = datetime.now()
thirty_days_ago = now - timedelta(days=30)
response = client.beta.admin.usage_metrics.list(
limit=30,
sort_by="date",
)
total_input = sum(m.input_tokens for m in response.data)
total_output = sum(m.output_tokens for m in response.data)
# Opus 4.5 pricing: $5/M input, $25/M output
input_cost = total_input * 0.000005
output_cost = total_output * 0.000025
total_cost = input_cost + output_cost
print(f"Last 30 days:")
print(f" Input tokens: {total_input:,}")
print(f" Output tokens: {total_output:,}")
print(f" Input cost: ${input_cost:.2f}")
print(f" Output cost: ${output_cost:.2f}")
print(f" Total cost: ${total_cost:.2f}")
return {
"input_tokens": total_input,
"output_tokens": total_output,
"input_cost": input_cost,
"output_cost": output_cost,
"total_cost": total_cost
}
# Run the function
costs = get_monthly_costs()
Quick Start - ROI Calculation
Calculate the business value of your Claude implementation:
def calculate_roi(
monthly_cost: float,
monthly_transactions: int,
cost_before_claude: float = None,
quality_improvement: float = 1.0
) -> dict:
"""Calculate ROI metrics for Claude implementation"""
cost_per_transaction = monthly_cost / monthly_transactions
metrics = {
"monthly_cost": monthly_cost,
"monthly_transactions": monthly_transactions,
"cost_per_transaction": cost_per_transaction,
}
# If you had costs before Claude (manual process, previous tool, etc)
if cost_before_claude:
savings = cost_before_claude - monthly_cost
roi_percentage = (savings / cost_before_claude) * 100
metrics["previous_cost"] = cost_before_claude
metrics["monthly_savings"] = savings
metrics["roi_percentage"] = roi_percentage
# Account for quality improvements
effective_cost = monthly_cost / quality_improvement
metrics["quality_adjusted_cost"] = effective_cost
return metrics
# Example: Research agent replacing manual research
result = calculate_roi(
monthly_cost=500, # Claude costs
monthly_transactions=1000, # Requests processed
cost_before_claude=3000, # Manual research was $3k/month
quality_improvement=1.5 # Claude results are 50% better
)
print(f"Cost per transaction: ${result['cost_per_transaction']:.4f}")
print(f"Monthly savings: ${result['monthly_savings']:.2f}")
print(f"ROI: {result['roi_percentage']:.0f}%")
Pricing Overview
Current Claude Model Pricing (as of November 2025):
| Model | Input | Output | Best For |
|---|---|---|---|
| Opus 4.5 | $5/M | $25/M | Complex reasoning, agents, coding |
| Sonnet 4.5 | $3/M | $15/M | Balanced performance/cost |
| Haiku 4.5 | $0.80/M | $4/M | Simple tasks, high volume |
Cost Impact of Optimization Techniques:
| Technique | Savings | Implementation Difficulty |
|---|---|---|
| Effort parameter (medium) | 20-40% | Easy (add 2 lines) |
| Effort parameter (low) | 50-70% | Easy (add 2 lines) |
| Context editing | 60-90% | Medium (requires setup) |
| Tool optimization | 37-85% | Medium (architecture change) |
| Prompt caching | 90% | Hard (infrastructure) |
| Model selection | 50-75% | Hard (architecture change) |
Example Cost Comparison (1M transactions/month):
Scenario: Classification task
Opus 4.5, high effort:
- Input: 50M tokens @ $5/M = $250
- Output: 10M tokens @ $25/M = $250
- Total: $500/month
Opus 4.5, low effort:
- Input: 50M tokens @ $5/M = $250
- Output: 5M tokens @ $25/M = $125
- Total: $375/month (25% savings)
Haiku 4.5, high effort:
- Input: 50M tokens @ $0.80/M = $40
- Output: 10M tokens @ $4/M = $40
- Total: $80/month (84% savings)
Optimization Decision Tree
START: Have high costs?
↓
Q1: Do you know what's causing the costs?
NO → Go to Step 2: Analyze Cost Drivers
YES → Q2: Have you tried effort parameter (Opus 4.5)?
NO → Apply effort parameter (medium/low)
Expect 20-70% savings, 2-4 hours implementation
YES → Q3: Do you have long conversations (>50K tokens)?
NO → Q4: Do you have 10+ tools in your agents?
NO → Q5: Can you cache repeated content?
YES → Implement prompt caching
Expect 90% savings on cached
NO → Consider model selection
Expect 2-5x cost reduction
YES → Implement tool search + deferred loading
Expect 85% context savings
YES → Implement context editing
Expect 60-90% savings on long tasks
Common Optimization Scenarios
Scenario 1: Reducing Agent Costs 50%
Before:
- Opus 4.5 with high effort
- Long reasoning loops
- All 20+ tools always loaded
- Monthly cost: $2,000
Optimizations (in order of impact):
- Effort Parameter: Switch to medium effort → 30% savings ($600)
- Tool Optimization: Use tool search + deferred loading → 20% savings ($400)
- Context Editing: Clear old tool results → 10% savings ($200)
- Total: 60% cost reduction to $800/month
Timeline: 20-30 hours implementation
Scenario 2: Reducing Classification Costs 80%
Before:
- Opus 4.5 high effort (overkill for classification)
- Simple yes/no decisions
- Monthly cost: $1,000
Optimizations:
- Model Selection: Switch to Haiku 4.5 → 80% savings ($800)
- Effort Parameter: Use low effort → Additional 20% on Haiku
- Prompt Caching: Cache classification rules → 90% savings on cache
- Total: 85% cost reduction to $150/month
Timeline: 5-10 hours implementation
Related Skills
For deeper dives into specific optimization areas, see:
- claude-opus-4-5-guide: Effort parameter trade-offs and Opus 4.5 capabilities
- claude-context-management: Context editing strategies (60-90% savings on long conversations)
- claude-advanced-tool-use: Tool optimization patterns (37-85% efficiency gains)
- anthropic-expert: Admin API reference and prompt caching basics
For complete optimization strategies, cost prediction models, and ROI measurement frameworks, see references/ directory.