context-optimization
Originally fromshipshitdev/library
SKILL.md
Context Optimization Techniques
Extend effective context capacity through compression, masking, caching, and partitioning. Effective optimization can 2-3x effective context capacity without larger models.
Optimization Strategies
| Strategy | Token Reduction | Use Case |
|---|---|---|
| Compaction | 50-70% | Message history dominates |
| Observation Masking | 60-80% | Tool outputs dominate |
| KV-Cache Optimization | 70%+ cache hits | Stable workloads |
| Context Partitioning | Variable | Complex multi-task |
Compaction
Summarize context when approaching limits:
if context_tokens / context_limit > 0.8:
context = compact_context(context)
Priority for compression:
- Tool outputs → replace with summaries
- Old turns → summarize early conversation
- Retrieved docs → summarize if recent versions exist
- Never compress system prompt
Summary generation by type:
- Tool outputs: Preserve findings, metrics, conclusions
- Conversational: Preserve decisions, commitments, context shifts
- Documents: Preserve key facts, remove supporting evidence
Observation Masking
Tool outputs can be 80%+ of tokens. Replace verbose outputs with references:
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"
Masking rules:
- Never mask: Current task critical, most recent turn, active reasoning
- Consider: 3+ turns old, key points extractable, purpose served
- Always mask: Repeated outputs, boilerplate, already summarized
KV-Cache Optimization
Cache Key/Value tensors for requests with identical prefixes:
# Cache-friendly ordering: stable content first
context = [
system_prompt, # Cacheable
tool_definitions, # Cacheable
reused_templates, # Reusable
unique_content # Unique
]
Design for cache stability:
- Avoid dynamic content (timestamps)
- Use consistent formatting
- Keep structure stable across sessions
Context Partitioning
Split work across sub-agents with isolated contexts:
# Each sub-agent has clean, focused context
results = await gather(
research_agent.search("topic A"),
research_agent.search("topic B"),
research_agent.search("topic C")
)
# Coordinator synthesizes without carrying full context
synthesized = await coordinator.synthesize(results)
Budget Management
context_budget = {
"system_prompt": 2000,
"tool_definitions": 3000,
"retrieved_docs": 10000,
"message_history": 15000,
"reserved_buffer": 2000
}
# Monitor and trigger optimization at 70-80%
When to Optimize
| Signal | Action |
|---|---|
| Utilization >70% | Start monitoring |
| Utilization >80% | Apply compaction |
| Quality degradation | Investigate cause |
| Tool outputs dominate | Observation masking |
| Docs dominate | Summarization/partitioning |
Performance Targets
- Compaction: 50-70% reduction, <5% quality loss
- Masking: 60-80% reduction in masked observations
- Cache: 70%+ hit rate for stable workloads
Best Practices
- Measure before optimizing
- Apply compaction before masking
- Design for cache stability
- Partition before context becomes problematic
- Monitor effectiveness over time
- Balance token savings vs quality
- Test at production scale
- Implement graceful degradation
Weekly Installs
37
Repository
eyadsibai/ltkFirst Seen
Jan 28, 2026
Security Audits
Installed on
gemini-cli30
opencode29
codex28
github-copilot27
claude-code24
kimi-cli23