claude-api-cost-optimization
SKILL.md
Claude API Cost Optimization
Save 50-90% on Claude API costs with three officially verified techniques
Quick Reference
| Technique | Savings | Use When |
|---|---|---|
| Batch API | 50% | Tasks can wait up to 24h |
| Prompt Caching | 90% | Repeated system prompts (>1K tokens) |
| Extended Thinking | ~80% | Complex reasoning tasks |
| Batch + Cache | ~95% | Bulk tasks with shared context |
1. Batch API (50% Off)
When to Use
- Bulk translations
- Daily content generation
- Overnight report processing
- NOT for real-time chat
Code Example
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
requests=[
{
"custom_id": "task-001",
"params": {
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Task 1"}]
}
}
]
)
# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
print(f"{result.custom_id}: {result.result.message.content[0].text}")
Key Finding: Bigger Batches = Faster!
| Batch Size | Time/Request |
|---|---|
| Large (294) | 0.45 min |
| Small (10) | 9.84 min |
22x efficiency difference! Always batch 100+ requests together.
2. Prompt Caching (90% Off)
When to Use
- Long system prompts (>1K tokens)
- Repeated instructions
- RAG with large context
Code Example
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[{
"type": "text",
"text": "Your long system prompt here...",
"cache_control": {"type": "ephemeral"} # Enable caching!
}],
messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)
Cache Rules
- Minimum: 1,024 tokens (Sonnet)
- TTL: 5 minutes (refreshes on use)
3. Extended Thinking (~80% Off)
When to Use
- Complex code architecture
- Strategic planning
- Mathematical reasoning
Code Example
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{"role": "user", "content": "Design architecture for..."}]
)
Decision Flowchart
Can wait 24h? → Yes → Batch API (50% off)
↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
↓ No
Complex reasoning? → Yes → Extended Thinking
↓ No
Use normal API
Official Docs
Made with 🐾 by Washin Village - Verified against official Anthropic documentation
Weekly Installs
8
Repository
sstklen/claude-…mizationGitHub Stars
32
First Seen
Jan 29, 2026
Security Audits
Installed on
cursor7
github-copilot6
codex6
kimi-cli6
gemini-cli6
amp6