agent-cost-optimizer
Agent Cost Optimizer
Overview
agent-cost-optimizer provides comprehensive cost tracking, budget enforcement, and ROI measurement for AI agent operations.
Purpose: Control and optimize AI spending while maximizing value delivered
Pattern: Task-based (7 operations for cost management)
Key Innovation: Real-time cost tracking with automatic budget enforcement and cost-effective fallbacks
Industry Context (2025):
- Average AI spending: $85,521/month (36% YoY increase)
- Only 50% of organizations can measure AI ROI
- IT teams struggle with hidden costs
Solution: Comprehensive cost management from tracking to optimization
When to Use
Use agent-cost-optimizer when:
- Tracking AI costs across skills
- Preventing budget overruns
- Measuring ROI (cost vs. value delivered)
- Optimizing model selection (Opus vs. Sonnet vs. Haiku)
- Planning AI budgets
- Cost-effective development
- Enterprise cost accountability
Prerequisites
Required
- AI API access (Anthropic, OpenAI, Google)
- Cost tracking capability (API usage data)
Optional
- Paid.ai or AgentOps integration (advanced cost tracking)
- Prometheus/Grafana (cost visualization)
- Budget approval workflow
Cost Operations
Operation 1: Track Token Usage
Purpose: Monitor token consumption per skill invocation
Process:
-
Initialize Tracking:
{ "tracking_id": "track_20250126_1200", "skill": "multi-ai-verification", "started_at": "2025-01-26T12:00:00Z", "tokens": { "prompt": 0, "completion": 0, "total": 0 }, "cost": { "amount_usd": 0.00, "model": "claude-sonnet-4-5" } } -
Track During Execution:
// After each AI call trackTokens({ prompt_tokens: response.usage.input_tokens, completion_tokens: response.usage.output_tokens, model: 'claude-sonnet-4-5' }); // Update running totals -
Finalize Tracking:
{ "tracking_id": "track_20250126_1200", "skill": "multi-ai-verification", "completed_at": "2025-01-26T12:45:00Z", "duration_minutes": 45, "tokens": { "prompt": 15234, "completion": 8932, "total": 24166 }, "cost": { "amount_usd": 0.073, "model": "claude-sonnet-4-5", "rate": "$3 per million tokens" } } -
Save to Cost Log:
# Append to daily cost log cat tracking.json >> .cost-tracking/$(date +%Y-%m-%d).json
Outputs:
- Token usage per invocation
- Cost per invocation
- Model used
- Daily/monthly aggregates
Validation:
- Tracking initialized
- Tokens counted accurately
- Cost calculated correctly
- Logs saved
Time Estimate: Automatic (integrated into skills)
Operation 2: Calculate Costs
Purpose: Compute accurate costs based on provider pricing
Pricing (2025 rates):
Anthropic (Claude):
| Model | Input (per MTok) | Output (per MTok) |
|---|---|---|
| Claude Opus 4.5 | $15 | $75 |
| Claude Sonnet 4.5 | $3 | $15 |
| Claude Haiku 4.5 | $0.80 | $4 |
OpenAI (Codex):
| Model | Input | Output |
|---|---|---|
| GPT-5.1-codex | $5 | $15 |
| o3 | $10 | $40 |
| o4-mini | $1.50 | $6 |
Google (Gemini):
| Model | Input | Output |
|---|---|---|
| Gemini 2.5 Pro | $1.25 | $5 |
| Gemini 2.5 Flash | $0.15 | $0.60 |
Process:
function calculateCost(usage, model) {
const pricing = {
'claude-sonnet-4-5': { input: 3, output: 15 },
'claude-haiku-4-5': { input: 0.80, output: 4 },
'claude-opus-4-5': { input: 15, output: 75 },
// ... more models
};
const rates = pricing[model];
const inputCost = (usage.prompt_tokens / 1_000_000) * rates.input;
const outputCost = (usage.completion_tokens / 1_000_000) * rates.output;
return {
input_cost: inputCost,
output_cost: outputCost,
total_cost: inputCost + outputCost,
currency: 'USD'
};
}
Outputs:
- Accurate cost per invocation
- Model-specific pricing
- Input vs. output cost breakdown
Operation 3: Enforce Budget Caps
Purpose: Prevent exceeding monthly budget limits
Process:
-
Set Budget:
{ "monthly_budget_usd": 100, "skill_budgets": { "multi-ai-verification": 70, "multi-ai-research": 20, "multi-ai-testing": 10 }, "alert_thresholds": { "warning": 0.80, "critical": 0.95 } } -
Check Before Operation:
async function checkBudget(skill, estimated_cost) { const usage = getCurrentMonthUsage(); const remaining = budget.monthly_budget_usd - usage.total_cost; if (estimated_cost > remaining) { return { allowed: false, reason: `Budget exceeded: $${usage.total_cost}/$${budget.monthly_budget_usd}`, overage: estimated_cost - remaining }; } if (usage.total_cost / budget.monthly_budget_usd > 0.80) { return { allowed: true, warning: `80% of monthly budget used ($${usage.total_cost}/$${budget.monthly_budget_usd})` }; } return { allowed: true }; } -
Enforce:
const budgetCheck = await checkBudget('multi-ai-verification', 0.50); if (!budgetCheck.allowed) { console.log(`❌ Budget exceeded: ${budgetCheck.reason}`); console.log(`Options:`); console.log(` 1. Use cheaper model (Sonnet → Haiku)`); console.log(` 2. Skip optional verification layers`); console.log(` 3. Request budget increase`); return; } if (budgetCheck.warning) { console.log(`⚠️ ${budgetCheck.warning}`); } // Proceed with operation
Outputs:
- Budget status checked
- Operations blocked if over budget
- Warnings at 80%
- Cost-effective alternatives suggested
Operation 4: Optimize Model Selection
Purpose: Choose cost-effective model for each task
Decision Matrix:
| Task Type | Recommended Model | Cost | Rationale |
|---|---|---|---|
| Simple verification (Layer 1-2) | Haiku | $ | Rules-based, fast, cheap |
| Code generation | Sonnet | $$ | Balanced quality/cost |
| Complex reasoning (architecture) | Opus | $$$ | Best quality, worth premium |
| LLM-as-judge | Sonnet or external model | $$ | Good judgment, reasonable cost |
| Test generation | Sonnet | $$ | Comprehensive coverage needed |
| Research | Sonnet/Haiku mix | $-$$ | Haiku for search, Sonnet for synthesis |
Auto-Optimization:
function selectModel(task_type, criticality, budget_remaining) {
// Critical + budget OK → Use Opus
if (criticality === 'critical' && budget_remaining > 20) {
return 'claude-opus-4-5';
}
// Standard → Use Sonnet
if (criticality === 'standard') {
return 'claude-sonnet-4-5';
}
// Budget low or simple task → Use Haiku
if (budget_remaining < 5 || task_type === 'simple') {
return 'claude-haiku-4-5';
}
return 'claude-sonnet-4-5'; // Default
}
Outputs:
- Optimal model selected
- Cost minimized
- Quality maintained
Operation 5: Cost-Effective Caching
Purpose: Avoid re-computing identical operations
Process:
-
Cache Key Generation:
function generateCacheKey(operation, inputs) { // Hash inputs to create unique key const content_hash = crypto .createHash('sha256') .update(JSON.stringify(inputs)) .digest('hex'); return `${operation}_${content_hash}`; } -
Check Cache Before Operation:
const cacheKey = generateCacheKey('verify_code', { files: ['src/auth.ts'], file_hashes: {'src/auth.ts': 'abc123'} }); const cached = readCache(cacheKey); if (cached && !isExpired(cached, 24)) { // Use cached result console.log('📦 Using cached verification result'); return cached.result; } // Cache miss → run verification const result = await runVerification(); // Save to cache saveCache(cacheKey, result, ttl: 24 hours); -
Cache Structure:
{ "cache_key": "verify_code_abc123def456", "created_at": "2025-01-26T12:00:00Z", "expires_at": "2025-01-27T12:00:00Z", "inputs": { "files": ["src/auth.ts"], "file_hashes": {"src/auth.ts": "abc123"} }, "result": { "quality_score": 92, "layers_passed": 5, "issues": [] }, "cost_saved": 0.073 }
Outputs:
- 90% reduction in re-verification costs
- Instant results for unchanged code
- Cache hit/miss tracking
Validation:
- Cache key correctly identifies identical operations
- File changes invalidate cache
- Expired cache not used
- Cost savings tracked
Operation 6: Measure ROI
Purpose: Calculate return on investment for AI spending
Process:
-
Track Time Saved:
{ "task": "Implement user authentication", "without_ai": { "estimated_hours": 40, "developer_rate": 100, "total_cost": 4000 }, "with_ai": { "actual_hours": 11.3, "developer_rate": 100, "developer_cost": 1130, "ai_cost": 2.50, "total_cost": 1132.50 }, "roi": { "time_saved_hours": 28.7, "cost_saved": 2867.50, "roi_percentage": 253, "payback_period_hours": 0.025 } } -
Calculate ROI:
function calculateROI(task) { const time_saved = task.without_ai.estimated_hours - task.with_ai.actual_hours; const cost_saved = task.without_ai.total_cost - task.with_ai.total_cost; const roi_percentage = (cost_saved / task.with_ai.ai_cost) * 100; return { time_saved_hours: time_saved, cost_saved_usd: cost_saved, roi_percentage: roi_percentage, payback_period: task.with_ai.ai_cost / (task.without_ai.developer_rate * (time_saved / 40)) // weeks }; } -
Monthly ROI Report:
# Monthly ROI Report - January 2025 ## AI Spending - Total AI costs: $87.50 - Breakdown: - multi-ai-verification: $62.30 (71%) - multi-ai-research: $18.40 (21%) - multi-ai-testing: $6.80 (8%) ## Time Savings - Tasks completed: 8 - Total hours saved: 156 hours - Average savings per task: 19.5 hours ## Cost Savings - Developer cost avoided: $15,600 (156h × $100/h) - AI costs: $87.50 - Net savings: $15,512.50 ## ROI - ROI: 17,728% ($177 saved per $1 spent) - Payback period: 0.03 weeks (immediate) - Value multiplier: 178x ## Recommendations - ✅ Current spending highly cost-effective - Continue using AI for all qualifying tasks - Consider increasing budget (high ROI)
Outputs:
- Comprehensive ROI analysis
- Time and cost savings quantified
- Monthly reports
- Business justification for AI spending
Operation 7: Predict Costs
Purpose: Estimate costs before starting expensive operations
Process:
-
Historical Data:
{ "operation": "multi-ai-verification", "mode": "all_5_layers", "historical_costs": [ {"date": "2025-01-15", "tokens": 24166, "cost": 0.073}, {"date": "2025-01-18", "tokens": 21893, "cost": 0.066}, {"date": "2025-01-20", "tokens": 26543, "cost": 0.080} ], "avg_cost": 0.073, "std_dev": 0.007 } -
Predict Before Operation:
const prediction = predictCost('multi-ai-verification', { mode: 'all_5_layers', code_size_lines: 850 }); console.log(`💰 Estimated cost: $${prediction.estimated_cost} ± $${prediction.std_dev}`); console.log(` Range: $${prediction.min_cost} - $${prediction.max_cost}`); console.log(` Confidence: ${prediction.confidence}%`); // Check budget if (prediction.estimated_cost > budget_remaining) { console.log(`⚠️ Estimated cost exceeds remaining budget`); console.log(`Options:`); console.log(` 1. Use Haiku (estimated: $${prediction.estimated_cost * 0.27})`); console.log(` 2. Skip Layer 5 (save ~60%: $${prediction.estimated_cost * 0.4})`); console.log(` 3. Increase budget`); }
Outputs:
- Cost predictions with confidence intervals
- Budget impact assessment
- Cost-effective alternatives suggested
Cost Optimization Strategies
Strategy 1: Model Selection
Baseline (All Sonnet): $100/month
Optimized (Smart selection):
- Layer 1-2: Haiku ($20)
- Layer 3-4: Sonnet ($40)
- Layer 5: Sonnet ($30)
- Total: $90/month (10% savings)
Aggressive (Maximum savings):
- Layer 1-2: Haiku ($20)
- Layer 3-4: Haiku ($15)
- Layer 5: Sonnet ($30)
- Total: $65/month (35% savings)
Trade-off: Some quality reduction at Layers 3-4
Strategy 2: Caching
Without Caching: Re-verify same code multiple times
With Caching (24-hour TTL):
- First verification: $0.073
- Same code next 24h: $0 (cache hit)
- Savings: 90% on unchanged code
Implementation:
const cacheKey = hash(files_to_verify);
const cached = getCache(cacheKey);
if (cached && !isExpired(cached, 24)) {
return cached.result; // $0 cost
}
const result = await verify(); // $0.073 cost
saveCache(cacheKey, result);
Strategy 3: Ensemble Optimization
Baseline (Always 5-agent ensemble):
- 5 agents × $0.073 = $0.365 per verification
Optimized (Conditional ensemble):
- Critical features: 5 agents ($0.365)
- Standard features: 3 agents ($0.219)
- Simple changes: 1 agent ($0.073)
- Average savings: 60%
Decision Logic:
function shouldUseEnsemble(criticality, code_size, budget) {
if (criticality === 'critical') return 5;
if (criticality === 'high' && budget > 20) return 3;
return 1; // Single agent
}
Strategy 4: Layer Skipping
Full Verification (All 5 layers): ~$0.073
Fast-Track (Layers 1-2 only):
- Rules + Functional only
- ~$0.015 (80% savings)
- Use for: minor changes, docs, config
Decision:
if (lines_changed < 50 && files_changed.every(f => !isCritical(f))) {
// Fast-track: Layers 1-2 only
mode = 'fast_track';
estimated_cost = 0.015;
} else {
// Full verification
mode = 'standard';
estimated_cost = 0.073;
}
Budget Management
Monthly Budget Planning
Sample Budget ($100/month):
{
"monthly_budget_usd": 100,
"allocation": {
"multi-ai-verification": {
"budget": 70,
"rationale": "Most expensive (LLM-as-judge)"
},
"multi-ai-research": {
"budget": 20,
"rationale": "Occasional use, tri-AI"
},
"multi-ai-testing": {
"budget": 10,
"rationale": "Mostly automated"
},
"buffer": 10
},
"assumptions": {
"features_per_month": 8,
"verifications_per_feature": 1.5,
"research_per_month": 2
}
}
Budget Tracking
Daily:
# Check today's spending
cat .cost-tracking/$(date +%Y-%m-%d).json | jq '[.[] | .cost.amount_usd] | add'
# Output: $3.45 today
Monthly:
# Check month-to-date
cat .cost-tracking/2025-01-*.json | jq '[.[] | .cost.amount_usd] | add'
# Output: $67.80 this month (68% of budget)
Projection:
// Project end-of-month
const days_elapsed = 26;
const days_in_month = 31;
const current_spend = 67.80;
const projected = (current_spend / days_elapsed) * days_in_month;
// = $81.14 projected (within budget ✅)
Cost Alerting
Alert Levels
80% Budget (Warning):
⚠️ BUDGET ALERT: 80% Used
**Current**: $80.00 / $100.00 (80%)
**Remaining**: $20.00
**Days left**: 5
**Projected EOMs**: $93.75 (within budget)
**Recommendations**:
- Monitor spending closely
- Use Haiku for simple tasks
- Cache aggressively
- Skip optional layers where safe
95% Budget (Critical):
🚨 CRITICAL: 95% Budget Used
**Current**: $95.00 / $100.00 (95%)
**Remaining**: $5.00
**Days left**: 5
**Projected EOM**: $110 (OVER BUDGET)
**Actions Required**:
1. Pause non-critical verifications
2. Use Haiku exclusively
3. Request budget increase OR
4. Defer work to next month
**Auto-throttling**: Enabled
- Only critical operations allowed
- All optional layers disabled
- Ensemble verification disabled
Budget Exceeded:
❌ BUDGET EXCEEDED
**Current**: $102.50 / $100.00 (102.5%)
**Overage**: $2.50
**Operations BLOCKED** until:
1. Budget increased OR
2. Next month (resets automatically)
**Emergency Override**: Requires approval
ROI Calculation Framework
Value Metrics
Quantifiable Value:
- Time saved (hours)
- Cost saved (developer time avoided)
- Quality improvement (fewer bugs in production)
- Faster time-to-market (days)
Formula:
ROI = ((Value Delivered - AI Costs) / AI Costs) × 100%
Example:
Feature without AI: 40 hours × $100/hour = $4,000
Feature with AI: 11.3 hours × $100/hour + $2.50 AI = $1,132.50
Value Delivered = $4,000 - $1,130 = $2,870
AI Costs = $2.50
ROI = ($2,870 / $2.50) × 100% = 114,800%
Monthly Reporting
Template:
# AI ROI Report - January 2025
## Summary
- **AI Spending**: $87.50
- **Value Delivered**: $15,600 (156 hours saved × $100/hour)
- **ROI**: 17,728%
- **Payback**: Immediate
## Details
### Features Delivered (8)
1. User authentication - 11.3h (was 40h), ROI: 114,800%
2. Payment integration - 8.5h (was 32h), ROI: 108,235%
[... more ...]
### Cost Breakdown
- Verification: $62.30 (71%) - Highest cost, highest value
- Research: $18.40 (21%) - Occasional, high impact
- Testing: $6.80 (8%) - Mostly automated, low cost
### Savings
- Time: 156 hours saved
- Cost: $15,512.50 net savings
- Quality: 4 bugs prevented (saved ~20 hours)
### Recommendations
- ✅ ROI is excellent (17,728%)
- Consider increasing budget (high returns)
- Current spending optimal
Quick Reference
Cost Operations
| Operation | Purpose | Time | Automation |
|---|---|---|---|
| Track | Monitor token usage | Automatic | 100% |
| Calculate | Compute costs | Automatic | 100% |
| Enforce | Budget caps | Automatic | 100% |
| Optimize | Model selection | Semi-auto | 70% |
| Cache | Avoid re-compute | Automatic | 100% |
| Measure ROI | Value analysis | Manual | 30% |
| Predict | Cost estimation | Automatic | 90% |
Cost Optimization Strategies
| Strategy | Savings | Trade-off | Recommended For |
|---|---|---|---|
| Model selection | 10-35% | Some quality loss | All features |
| Caching | 90% | Stale results risk | Unchanged code |
| Ensemble optimization | 60% | Lower confidence | Non-critical |
| Layer skipping | 80% | Less thorough | Minor changes |
Budget Thresholds
- < 80%: Normal operation
- 80-95%: Warning, optimize
- 95-100%: Critical, throttle
- > 100%: Block operations
agent-cost-optimizer ensures cost-effective AI operations through real-time tracking, budget enforcement, model optimization, and ROI measurement - preventing budget overruns while maximizing value delivered.
For cost reports, see examples/. For optimization strategies, see Cost Optimization Strategies section.