Agent Cost Optimizer

Overview

agent-cost-optimizer provides comprehensive cost tracking, budget enforcement, and ROI measurement for AI agent operations.

Purpose: Control and optimize AI spending while maximizing value delivered

Pattern: Task-based (7 operations for cost management)

Key Innovation: Real-time cost tracking with automatic budget enforcement and cost-effective fallbacks

Industry Context (2025):

Average AI spending: $85,521/month (36% YoY increase)
Only 50% of organizations can measure AI ROI
IT teams struggle with hidden costs

Solution: Comprehensive cost management from tracking to optimization

When to Use

Use agent-cost-optimizer when:

Tracking AI costs across skills
Preventing budget overruns
Measuring ROI (cost vs. value delivered)
Optimizing model selection (Opus vs. Sonnet vs. Haiku)
Planning AI budgets
Cost-effective development
Enterprise cost accountability

Prerequisites

Required

AI API access (Anthropic, OpenAI, Google)
Cost tracking capability (API usage data)

Optional

Paid.ai or AgentOps integration (advanced cost tracking)
Prometheus/Grafana (cost visualization)
Budget approval workflow

Cost Operations

Operation 1: Track Token Usage

Purpose: Monitor token consumption per skill invocation

Process:

Initialize Tracking:

{
  "tracking_id": "track_20250126_1200",
  "skill": "multi-ai-verification",
  "started_at": "2025-01-26T12:00:00Z",
  "tokens": {
    "prompt": 0,
    "completion": 0,
    "total": 0
  },
  "cost": {
    "amount_usd": 0.00,
    "model": "claude-sonnet-4-5"
  }
}

Track During Execution:

// After each AI call
trackTokens({
  prompt_tokens: response.usage.input_tokens,
  completion_tokens: response.usage.output_tokens,
  model: 'claude-sonnet-4-5'
});

// Update running totals

Finalize Tracking:

{
  "tracking_id": "track_20250126_1200",
  "skill": "multi-ai-verification",
  "completed_at": "2025-01-26T12:45:00Z",
  "duration_minutes": 45,
  "tokens": {
    "prompt": 15234,
    "completion": 8932,
    "total": 24166
  },
  "cost": {
    "amount_usd": 0.073,
    "model": "claude-sonnet-4-5",
    "rate": "$3 per million tokens"
  }
}

Save to Cost Log:

# Append to daily cost log
cat tracking.json >> .cost-tracking/$(date +%Y-%m-%d).json

Outputs:

Token usage per invocation
Cost per invocation
Model used
Daily/monthly aggregates

Validation:

Tracking initialized
Tokens counted accurately
Cost calculated correctly
Logs saved

Time Estimate: Automatic (integrated into skills)

Operation 2: Calculate Costs

Purpose: Compute accurate costs based on provider pricing

Pricing (2025 rates):

Anthropic (Claude):

Model	Input (per MTok)	Output (per MTok)
Claude Opus 4.5	$15	$75
Claude Sonnet 4.5	$3	$15
Claude Haiku 4.5	$0.80	$4

OpenAI (Codex):

Model	Input	Output
GPT-5.1-codex	$5	$15
o3	$10	$40
o4-mini	$1.50	$6

Google (Gemini):

Model	Input	Output
Gemini 2.5 Pro	$1.25	$5
Gemini 2.5 Flash	$0.15	$0.60

Process:

function calculateCost(usage, model) {
  const pricing = {
    'claude-sonnet-4-5': { input: 3, output: 15 },
    'claude-haiku-4-5': { input: 0.80, output: 4 },
    'claude-opus-4-5': { input: 15, output: 75 },
    // ... more models
  };

  const rates = pricing[model];
  const inputCost = (usage.prompt_tokens / 1_000_000) * rates.input;
  const outputCost = (usage.completion_tokens / 1_000_000) * rates.output;

  return {
    input_cost: inputCost,
    output_cost: outputCost,
    total_cost: inputCost + outputCost,
    currency: 'USD'
  };
}

Outputs:

Accurate cost per invocation
Model-specific pricing
Input vs. output cost breakdown

Operation 3: Enforce Budget Caps

Purpose: Prevent exceeding monthly budget limits

Process:

Set Budget:

{
  "monthly_budget_usd": 100,
  "skill_budgets": {
    "multi-ai-verification": 70,
    "multi-ai-research": 20,
    "multi-ai-testing": 10
  },
  "alert_thresholds": {
    "warning": 0.80,
    "critical": 0.95
  }
}

Check Before Operation:

async function checkBudget(skill, estimated_cost) {
  const usage = getCurrentMonthUsage();
  const remaining = budget.monthly_budget_usd - usage.total_cost;

  if (estimated_cost > remaining) {
    return {
      allowed: false,
      reason: `Budget exceeded: $${usage.total_cost}/$${budget.monthly_budget_usd}`,
      overage: estimated_cost - remaining
    };
  }

  if (usage.total_cost / budget.monthly_budget_usd > 0.80) {
    return {
      allowed: true,
      warning: `80% of monthly budget used ($${usage.total_cost}/$${budget.monthly_budget_usd})`
    };
  }

  return { allowed: true };
}

Enforce:

const budgetCheck = await checkBudget('multi-ai-verification', 0.50);

if (!budgetCheck.allowed) {
  console.log(`❌ Budget exceeded: ${budgetCheck.reason}`);
  console.log(`Options:`);
  console.log(`  1. Use cheaper model (Sonnet → Haiku)`);
  console.log(`  2. Skip optional verification layers`);
  console.log(`  3. Request budget increase`);
  return;
}

if (budgetCheck.warning) {
  console.log(`⚠️ ${budgetCheck.warning}`);
}

// Proceed with operation

Outputs:

Budget status checked
Operations blocked if over budget
Warnings at 80%
Cost-effective alternatives suggested

Operation 4: Optimize Model Selection

Purpose: Choose cost-effective model for each task

Decision Matrix:

Task Type	Recommended Model	Cost	Rationale
Simple verification (Layer 1-2)	Haiku	$	Rules-based, fast, cheap
Code generation	Sonnet	$$	Balanced quality/cost
Complex reasoning (architecture)	Opus	$$$	Best quality, worth premium
LLM-as-judge	Sonnet or external model	$$	Good judgment, reasonable cost
Test generation	Sonnet	$$	Comprehensive coverage needed
Research	Sonnet/Haiku mix	$-$$	Haiku for search, Sonnet for synthesis

Auto-Optimization:

function selectModel(task_type, criticality, budget_remaining) {
  // Critical + budget OK → Use Opus
  if (criticality === 'critical' && budget_remaining > 20) {
    return 'claude-opus-4-5';
  }

  // Standard → Use Sonnet
  if (criticality === 'standard') {
    return 'claude-sonnet-4-5';
  }

  // Budget low or simple task → Use Haiku
  if (budget_remaining < 5 || task_type === 'simple') {
    return 'claude-haiku-4-5';
  }

  return 'claude-sonnet-4-5'; // Default
}

Outputs:

Optimal model selected
Cost minimized
Quality maintained

Operation 5: Cost-Effective Caching

Purpose: Avoid re-computing identical operations

Process:

Cache Key Generation:

function generateCacheKey(operation, inputs) {
  // Hash inputs to create unique key
  const content_hash = crypto
    .createHash('sha256')
    .update(JSON.stringify(inputs))
    .digest('hex');

  return `${operation}_${content_hash}`;
}

Check Cache Before Operation:

const cacheKey = generateCacheKey('verify_code', {
  files: ['src/auth.ts'],
  file_hashes: {'src/auth.ts': 'abc123'}
});

const cached = readCache(cacheKey);

if (cached && !isExpired(cached, 24)) {
  // Use cached result
  console.log('📦 Using cached verification result');
  return cached.result;
}

// Cache miss → run verification
const result = await runVerification();

// Save to cache
saveCache(cacheKey, result, ttl: 24 hours);

Cache Structure:

{
  "cache_key": "verify_code_abc123def456",
  "created_at": "2025-01-26T12:00:00Z",
  "expires_at": "2025-01-27T12:00:00Z",
  "inputs": {
    "files": ["src/auth.ts"],
    "file_hashes": {"src/auth.ts": "abc123"}
  },
  "result": {
    "quality_score": 92,
    "layers_passed": 5,
    "issues": []
  },
  "cost_saved": 0.073
}

Outputs:

90% reduction in re-verification costs
Instant results for unchanged code
Cache hit/miss tracking

Validation:

Cache key correctly identifies identical operations
File changes invalidate cache
Expired cache not used
Cost savings tracked

Operation 6: Measure ROI

Purpose: Calculate return on investment for AI spending

Process:

Track Time Saved:

{
  "task": "Implement user authentication",
  "without_ai": {
    "estimated_hours": 40,
    "developer_rate": 100,
    "total_cost": 4000
  },
  "with_ai": {
    "actual_hours": 11.3,
    "developer_rate": 100,
    "developer_cost": 1130,
    "ai_cost": 2.50,
    "total_cost": 1132.50
  },
  "roi": {
    "time_saved_hours": 28.7,
    "cost_saved": 2867.50,
    "roi_percentage": 253,
    "payback_period_hours": 0.025
  }
}

Calculate ROI:

function calculateROI(task) {
  const time_saved = task.without_ai.estimated_hours - task.with_ai.actual_hours;
  const cost_saved = task.without_ai.total_cost - task.with_ai.total_cost;
  const roi_percentage = (cost_saved / task.with_ai.ai_cost) * 100;

  return {
    time_saved_hours: time_saved,
    cost_saved_usd: cost_saved,
    roi_percentage: roi_percentage,
    payback_period: task.with_ai.ai_cost / (task.without_ai.developer_rate * (time_saved / 40)) // weeks
  };
}

Monthly ROI Report:

# Monthly ROI Report - January 2025

## AI Spending
- Total AI costs: $87.50
- Breakdown:
  - multi-ai-verification: $62.30 (71%)
  - multi-ai-research: $18.40 (21%)
  - multi-ai-testing: $6.80 (8%)

## Time Savings
- Tasks completed: 8
- Total hours saved: 156 hours
- Average savings per task: 19.5 hours

## Cost Savings
- Developer cost avoided: $15,600 (156h × $100/h)
- AI costs: $87.50
- Net savings: $15,512.50

## ROI
- ROI: 17,728% ($177 saved per $1 spent)
- Payback period: 0.03 weeks (immediate)
- Value multiplier: 178x

## Recommendations
- ✅ Current spending highly cost-effective
- Continue using AI for all qualifying tasks
- Consider increasing budget (high ROI)

Outputs:

Comprehensive ROI analysis
Time and cost savings quantified
Monthly reports
Business justification for AI spending

Operation 7: Predict Costs

Purpose: Estimate costs before starting expensive operations

Process:

Historical Data:

{
  "operation": "multi-ai-verification",
  "mode": "all_5_layers",
  "historical_costs": [
    {"date": "2025-01-15", "tokens": 24166, "cost": 0.073},
    {"date": "2025-01-18", "tokens": 21893, "cost": 0.066},
    {"date": "2025-01-20", "tokens": 26543, "cost": 0.080}
  ],
  "avg_cost": 0.073,
  "std_dev": 0.007
}

Predict Before Operation:

const prediction = predictCost('multi-ai-verification', {
  mode: 'all_5_layers',
  code_size_lines: 850
});

console.log(`💰 Estimated cost: $${prediction.estimated_cost} ± $${prediction.std_dev}`);
console.log(`   Range: $${prediction.min_cost} - $${prediction.max_cost}`);
console.log(`   Confidence: ${prediction.confidence}%`);

// Check budget
if (prediction.estimated_cost > budget_remaining) {
  console.log(`⚠️ Estimated cost exceeds remaining budget`);
  console.log(`Options:`);
  console.log(`  1. Use Haiku (estimated: $${prediction.estimated_cost * 0.27})`);
  console.log(`  2. Skip Layer 5 (save ~60%: $${prediction.estimated_cost * 0.4})`);
  console.log(`  3. Increase budget`);
}

Outputs:

Cost predictions with confidence intervals
Budget impact assessment
Cost-effective alternatives suggested

Cost Optimization Strategies

Strategy 1: Model Selection

Baseline (All Sonnet): $100/month

Optimized (Smart selection):

Layer 1-2: Haiku ($20)
Layer 3-4: Sonnet ($40)
Layer 5: Sonnet ($30)
Total: $90/month (10% savings)

Aggressive (Maximum savings):

Layer 1-2: Haiku ($20)
Layer 3-4: Haiku ($15)
Layer 5: Sonnet ($30)
Total: $65/month (35% savings)

Trade-off: Some quality reduction at Layers 3-4

Strategy 2: Caching

Without Caching: Re-verify same code multiple times

With Caching (24-hour TTL):

First verification: $0.073
Same code next 24h: $0 (cache hit)
Savings: 90% on unchanged code

Implementation:

const cacheKey = hash(files_to_verify);
const cached = getCache(cacheKey);

if (cached && !isExpired(cached, 24)) {
  return cached.result; // $0 cost
}

const result = await verify(); // $0.073 cost
saveCache(cacheKey, result);

Strategy 3: Ensemble Optimization

Baseline (Always 5-agent ensemble):

5 agents × $0.073 = $0.365 per verification

Optimized (Conditional ensemble):

Critical features: 5 agents ($0.365)
Standard features: 3 agents ($0.219)
Simple changes: 1 agent ($0.073)
Average savings: 60%

Decision Logic:

function shouldUseEnsemble(criticality, code_size, budget) {
  if (criticality === 'critical') return 5;
  if (criticality === 'high' && budget > 20) return 3;
  return 1; // Single agent
}

Strategy 4: Layer Skipping

Full Verification (All 5 layers): ~$0.073

Fast-Track (Layers 1-2 only):

Rules + Functional only
~$0.015 (80% savings)
Use for: minor changes, docs, config

Decision:

if (lines_changed < 50 && files_changed.every(f => !isCritical(f))) {
  // Fast-track: Layers 1-2 only
  mode = 'fast_track';
  estimated_cost = 0.015;
} else {
  // Full verification
  mode = 'standard';
  estimated_cost = 0.073;
}

Budget Management

Monthly Budget Planning

Sample Budget ($100/month):

{
  "monthly_budget_usd": 100,
  "allocation": {
    "multi-ai-verification": {
      "budget": 70,
      "rationale": "Most expensive (LLM-as-judge)"
    },
    "multi-ai-research": {
      "budget": 20,
      "rationale": "Occasional use, tri-AI"
    },
    "multi-ai-testing": {
      "budget": 10,
      "rationale": "Mostly automated"
    },
    "buffer": 10
  },
  "assumptions": {
    "features_per_month": 8,
    "verifications_per_feature": 1.5,
    "research_per_month": 2
  }
}

Budget Tracking

Daily:

# Check today's spending
cat .cost-tracking/$(date +%Y-%m-%d).json | jq '[.[] | .cost.amount_usd] | add'

# Output: $3.45 today

Monthly:

# Check month-to-date
cat .cost-tracking/2025-01-*.json | jq '[.[] | .cost.amount_usd] | add'

# Output: $67.80 this month (68% of budget)

Projection:

// Project end-of-month
const days_elapsed = 26;
const days_in_month = 31;
const current_spend = 67.80;

const projected = (current_spend / days_elapsed) * days_in_month;
// = $81.14 projected (within budget ✅)

Cost Alerting

Alert Levels

80% Budget (Warning):

⚠️  BUDGET ALERT: 80% Used

**Current**: $80.00 / $100.00 (80%)
**Remaining**: $20.00
**Days left**: 5

**Projected EOMs**: $93.75 (within budget)

**Recommendations**:
- Monitor spending closely
- Use Haiku for simple tasks
- Cache aggressively
- Skip optional layers where safe

95% Budget (Critical):

🚨 CRITICAL: 95% Budget Used

**Current**: $95.00 / $100.00 (95%)
**Remaining**: $5.00
**Days left**: 5

**Projected EOM**: $110 (OVER BUDGET)

**Actions Required**:
1. Pause non-critical verifications
2. Use Haiku exclusively
3. Request budget increase OR
4. Defer work to next month

**Auto-throttling**: Enabled
- Only critical operations allowed
- All optional layers disabled
- Ensemble verification disabled

Budget Exceeded:

❌ BUDGET EXCEEDED

**Current**: $102.50 / $100.00 (102.5%)
**Overage**: $2.50

**Operations BLOCKED** until:
1. Budget increased OR
2. Next month (resets automatically)

**Emergency Override**: Requires approval

ROI Calculation Framework

Value Metrics

Quantifiable Value:

Time saved (hours)
Cost saved (developer time avoided)
Quality improvement (fewer bugs in production)
Faster time-to-market (days)

Formula:

ROI = ((Value Delivered - AI Costs) / AI Costs) × 100%

Example:

Feature without AI: 40 hours × $100/hour = $4,000
Feature with AI: 11.3 hours × $100/hour + $2.50 AI = $1,132.50

Value Delivered = $4,000 - $1,130 = $2,870
AI Costs = $2.50
ROI = ($2,870 / $2.50) × 100% = 114,800%

Monthly Reporting

Template:

# AI ROI Report - January 2025

## Summary
- **AI Spending**: $87.50
- **Value Delivered**: $15,600 (156 hours saved × $100/hour)
- **ROI**: 17,728%
- **Payback**: Immediate

## Details

### Features Delivered (8)
1. User authentication - 11.3h (was 40h), ROI: 114,800%
2. Payment integration - 8.5h (was 32h), ROI: 108,235%
[... more ...]

### Cost Breakdown
- Verification: $62.30 (71%) - Highest cost, highest value
- Research: $18.40 (21%) - Occasional, high impact
- Testing: $6.80 (8%) - Mostly automated, low cost

### Savings
- Time: 156 hours saved
- Cost: $15,512.50 net savings
- Quality: 4 bugs prevented (saved ~20 hours)

### Recommendations
- ✅ ROI is excellent (17,728%)
- Consider increasing budget (high returns)
- Current spending optimal

Quick Reference

Cost Operations

Operation	Purpose	Time	Automation
Track	Monitor token usage	Automatic	100%
Calculate	Compute costs	Automatic	100%
Enforce	Budget caps	Automatic	100%
Optimize	Model selection	Semi-auto	70%
Cache	Avoid re-compute	Automatic	100%
Measure ROI	Value analysis	Manual	30%
Predict	Cost estimation	Automatic	90%

Cost Optimization Strategies

Strategy	Savings	Trade-off	Recommended For
Model selection	10-35%	Some quality loss	All features
Caching	90%	Stale results risk	Unchanged code
Ensemble optimization	60%	Lower confidence	Non-critical
Layer skipping	80%	Less thorough	Minor changes

Budget Thresholds

< 80%: Normal operation
80-95%: Warning, optimize
95-100%: Critical, throttle
> 100%: Block operations

agent-cost-optimizer ensures cost-effective AI operations through real-time tracking, budget enforcement, model optimization, and ROI measurement - preventing budget overruns while maximizing value delivered.

For cost reports, see examples/. For optimization strategies, see Cost Optimization Strategies section.