skills/sstklen/claude-api-cost-optimization/claude-api-cost-optimization

claude-api-cost-optimization

SKILL.md

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

Technique Savings Use When
Batch API 50% Tasks can wait up to 24h
Prompt Caching 90% Repeated system prompts (>1K tokens)
Extended Thinking ~80% Complex reasoning tasks
Batch + Cache ~95% Bulk tasks with shared context

1. Batch API (50% Off)

When to Use

  • Bulk translations
  • Daily content generation
  • Overnight report processing
  • NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

Batch Size Time/Request
Large (294) 0.45 min
Small (10) 9.84 min

22x efficiency difference! Always batch 100+ requests together.


2. Prompt Caching (90% Off)

When to Use

  • Long system prompts (>1K tokens)
  • Repeated instructions
  • RAG with large context

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

  • Minimum: 1,024 tokens (Sonnet)
  • TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

  • Complex code architecture
  • Strategic planning
  • Mathematical reasoning

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs


Made with 🐾 by Washin Village - Verified against official Anthropic documentation

Weekly Installs
8
GitHub Stars
32
First Seen
Jan 29, 2026
Installed on
cursor7
github-copilot6
codex6
kimi-cli6
gemini-cli6
amp6