Claude Cost Optimization

Overview

Cost optimization is critical for production Claude deployments. A single inefficiently-designed agent can cost hundreds or thousands of dollars monthly, while optimized implementations cost 10-90% less for identical functionality. This skill provides a comprehensive workflow for measuring, analyzing, and optimizing token costs.

Why This Matters:

Token costs are your largest Claude expense
Small improvements compound over millions of API calls
Context optimization alone saves 60-90% on long conversations
Model + effort selection can reduce costs 5-10x for specific tasks

Key Savings Available:

Effort parameter: 20-70% token reduction (same model, different reasoning depth)
Context editing: 60-90% reduction on long conversations
Tool optimization: 37-85% reduction with advanced tool patterns
Prompt caching: 90% reduction on repeated content
Model selection: 2-5x cost difference between models

When to Use This Skill

Use claude-cost-optimization when you need to:

Track Token Costs: Understand exactly what your Claude implementation costs
Identify Expensive Patterns: Find which operations consume the most tokens
Measure ROI: Calculate the business value of your Claude integration
Optimize for Production: Reduce costs before deploying expensive agents
Analyze Cost Drivers: Break down costs by model, feature, endpoint, or time period
Plan Budget: Forecast future costs based on growth projections
Implement Optimizations: Apply proven techniques (caching, batching, context editing)
Set Alerts: Monitor costs and get notified of anomalies or budget overruns

5-Step Optimization Workflow

Step 1: Measure Baseline Usage

Establish your current cost baseline before optimization.

What to Measure:

- Total monthly tokens (input + output)
- Cost breakdown by model
- Top 10 most expensive operations
- Average tokens per request
- Peak usage times and patterns

How to Measure (using Admin API):

from anthropic import Anthropic

client = Anthropic()

# Get monthly usage
response = client.beta.admin.usage_metrics.list(
    limit=30,
    sort_by="date",
)

total_input_tokens = sum(m.input_tokens for m in response.data)
total_output_tokens = sum(m.output_tokens for m in response.data)
total_cost = (total_input_tokens * 0.000005) + (total_output_tokens * 0.000025)
print(f"Monthly cost: ${total_cost:.2f}")

Where to Start: See references/usage-tracking.md for detailed Admin API integration

Step 2: Analyze Cost Drivers

Understand where your costs actually come from.

Identify Expensive Patterns:

Which operations use the most tokens?
Which models cost the most?
Are you using caching effectively?
Are context windows growing unnecessarily?
Are you making redundant API calls?

Create Cost Breakdown (example):

Agent reasoning loops: 45% of costs
File analysis: 25% of costs
Web search: 15% of costs
Classification tasks: 10% of costs
Other: 5% of costs

Key Metrics to Calculate:

Cost per transaction
Tokens per transaction
Cost per business outcome
Cost trend (week-over-week)

Step 3: Apply Optimizations

Apply targeted optimizations to your biggest cost drivers.

Effort Parameter (if using Opus 4.5):

Complex reasoning: high effort (default)
Balanced tasks: medium effort (20-40% savings)
Simple classification: low effort (50-70% savings)

Context Editing (for long conversations):

Automatic tool result clearing (saves 60-90%)
Client-side compaction (saves automatic summarization)
Memory tool integration (enables infinite conversations)

Tool Optimization (for large tool sets):

Tool search with deferred loading (supports 10K+ tools)
Programmatic calling (37% token reduction on data processing)
Tool examples (improve accuracy 72% → 90%)

Prompt Caching (for repeated content):

Cache system prompts (90% cost reduction on cached portion)
Cache repeated files/documents
Cache tool definitions

Model Selection:

Opus 4.5: $5/M input, $25/M output (complex tasks)
Sonnet 4.5: (see references for pricing)
Haiku 4.5: (see references for pricing)

Step 4: Track Improvements

Monitor cost reductions and efficiency gains after optimizations.

Metrics to Track:

Cost per transaction (before vs after)
Total token reduction percentage
Quality impact (did results improve or worsen?)
Implementation difficulty and time

Measurement Period: Track for 1-2 weeks per optimization to see impact

Example Impact:

Optimization: Client-side compaction on long research tasks
Before: 450K tokens/request, $11.25 cost
After:  180K tokens/request, $4.50 cost
Savings: 60% cost reduction

Step 5: Report ROI

Calculate business value of your optimizations.

ROI Calculation:

Monthly Savings = (Daily Cost × 30) - (Optimized Cost × 30)
Implementation Hours = Time to implement optimizations
Cost per Hour = $100-300 (your eng cost)
Payback Period = (Implementation Hours × Cost per Hour) / Monthly Savings

ROI Example:

Monthly savings: $500/month
Implementation: 8 hours
Cost per hour: $150
Implementation cost: $1,200
Payback period: 2.4 months
First year ROI: 400%

Quick Start - Usage Tracking

Get started with Admin API cost tracking in 5 minutes:

import anthropic
from datetime import datetime, timedelta

client = anthropic.Anthropic()

def get_monthly_costs():
    """Get current month's token costs"""

    # Get usage for last 30 days
    now = datetime.now()
    thirty_days_ago = now - timedelta(days=30)

    response = client.beta.admin.usage_metrics.list(
        limit=30,
        sort_by="date",
    )

    total_input = sum(m.input_tokens for m in response.data)
    total_output = sum(m.output_tokens for m in response.data)

    # Opus 4.5 pricing: $5/M input, $25/M output
    input_cost = total_input * 0.000005
    output_cost = total_output * 0.000025
    total_cost = input_cost + output_cost

    print(f"Last 30 days:")
    print(f"  Input tokens: {total_input:,}")
    print(f"  Output tokens: {total_output:,}")
    print(f"  Input cost: ${input_cost:.2f}")
    print(f"  Output cost: ${output_cost:.2f}")
    print(f"  Total cost: ${total_cost:.2f}")

    return {
        "input_tokens": total_input,
        "output_tokens": total_output,
        "input_cost": input_cost,
        "output_cost": output_cost,
        "total_cost": total_cost
    }

# Run the function
costs = get_monthly_costs()

Quick Start - ROI Calculation

Calculate the business value of your Claude implementation:

def calculate_roi(
    monthly_cost: float,
    monthly_transactions: int,
    cost_before_claude: float = None,
    quality_improvement: float = 1.0
) -> dict:
    """Calculate ROI metrics for Claude implementation"""

    cost_per_transaction = monthly_cost / monthly_transactions

    metrics = {
        "monthly_cost": monthly_cost,
        "monthly_transactions": monthly_transactions,
        "cost_per_transaction": cost_per_transaction,
    }

    # If you had costs before Claude (manual process, previous tool, etc)
    if cost_before_claude:
        savings = cost_before_claude - monthly_cost
        roi_percentage = (savings / cost_before_claude) * 100
        metrics["previous_cost"] = cost_before_claude
        metrics["monthly_savings"] = savings
        metrics["roi_percentage"] = roi_percentage

    # Account for quality improvements
    effective_cost = monthly_cost / quality_improvement
    metrics["quality_adjusted_cost"] = effective_cost

    return metrics

# Example: Research agent replacing manual research
result = calculate_roi(
    monthly_cost=500,          # Claude costs
    monthly_transactions=1000,  # Requests processed
    cost_before_claude=3000,   # Manual research was $3k/month
    quality_improvement=1.5    # Claude results are 50% better
)

print(f"Cost per transaction: ${result['cost_per_transaction']:.4f}")
print(f"Monthly savings: ${result['monthly_savings']:.2f}")
print(f"ROI: {result['roi_percentage']:.0f}%")

Pricing Overview

Current Claude Model Pricing (as of November 2025):

Model	Input	Output	Best For
Opus 4.5	$5/M	$25/M	Complex reasoning, agents, coding
Sonnet 4.5	$3/M	$15/M	Balanced performance/cost
Haiku 4.5	$0.80/M	$4/M	Simple tasks, high volume

Cost Impact of Optimization Techniques:

Technique	Savings	Implementation Difficulty
Effort parameter (medium)	20-40%	Easy (add 2 lines)
Effort parameter (low)	50-70%	Easy (add 2 lines)
Context editing	60-90%	Medium (requires setup)
Tool optimization	37-85%	Medium (architecture change)
Prompt caching	90%	Hard (infrastructure)
Model selection	50-75%	Hard (architecture change)

Example Cost Comparison (1M transactions/month):

Scenario: Classification task

Opus 4.5, high effort:
- Input: 50M tokens @ $5/M = $250
- Output: 10M tokens @ $25/M = $250
- Total: $500/month

Opus 4.5, low effort:
- Input: 50M tokens @ $5/M = $250
- Output: 5M tokens @ $25/M = $125
- Total: $375/month (25% savings)

Haiku 4.5, high effort:
- Input: 50M tokens @ $0.80/M = $40
- Output: 10M tokens @ $4/M = $40
- Total: $80/month (84% savings)

Optimization Decision Tree

START: Have high costs?
  ↓
  Q1: Do you know what's causing the costs?
    NO → Go to Step 2: Analyze Cost Drivers
    YES → Q2: Have you tried effort parameter (Opus 4.5)?
            NO → Apply effort parameter (medium/low)
                 Expect 20-70% savings, 2-4 hours implementation
            YES → Q3: Do you have long conversations (>50K tokens)?
                    NO → Q4: Do you have 10+ tools in your agents?
                            NO → Q5: Can you cache repeated content?
                                    YES → Implement prompt caching
                                          Expect 90% savings on cached
                                    NO → Consider model selection
                                         Expect 2-5x cost reduction
                            YES → Implement tool search + deferred loading
                                  Expect 85% context savings
                    YES → Implement context editing
                          Expect 60-90% savings on long tasks

Common Optimization Scenarios

Scenario 1: Reducing Agent Costs 50%

Before:

Opus 4.5 with high effort
Long reasoning loops
All 20+ tools always loaded
Monthly cost: $2,000

Optimizations (in order of impact):

Effort Parameter: Switch to medium effort → 30% savings ($600)
Tool Optimization: Use tool search + deferred loading → 20% savings ($400)
Context Editing: Clear old tool results → 10% savings ($200)
Total: 60% cost reduction to $800/month

Timeline: 20-30 hours implementation

Scenario 2: Reducing Classification Costs 80%

Before:

Opus 4.5 high effort (overkill for classification)
Simple yes/no decisions
Monthly cost: $1,000

Optimizations:

Model Selection: Switch to Haiku 4.5 → 80% savings ($800)
Effort Parameter: Use low effort → Additional 20% on Haiku
Prompt Caching: Cache classification rules → 90% savings on cache
Total: 85% cost reduction to $150/month

Timeline: 5-10 hours implementation

Related Skills

For deeper dives into specific optimization areas, see:

claude-opus-4-5-guide: Effort parameter trade-offs and Opus 4.5 capabilities
claude-context-management: Context editing strategies (60-90% savings on long conversations)
claude-advanced-tool-use: Tool optimization patterns (37-85% efficiency gains)
anthropic-expert: Admin API reference and prompt caching basics

For complete optimization strategies, cost prediction models, and ROI measurement frameworks, see references/ directory.