skills/adaptationio/skrillz/context-engineering

context-engineering

SKILL.md

Context Engineering

Overview

context-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns.

Purpose: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality

The 5 Context Optimization Operations:

  1. Monitor Context Usage - Track token consumption, identify heavy consumers
  2. Reduce Context Load - Remove stale content, minimize loaded files
  3. Optimize Skill Design - Progressive disclosure, efficient reference loading
  4. Separate Planning/Execution - Keep execution context clean
  5. File-Based Strategies - Externalize large data (95% token savings)

Key Benefits:

  • 29-39% performance improvement with context editing strategies
  • 95% token savings using file-based approaches for large data
  • Sustained quality in multi-turn conversations
  • Cost reduction through efficient token usage
  • Prevention of context overflow and conversation drift

Context Window Sizes (2025):

  • Sonnet 4/4.5: 200k tokens (standard), 500k-1M (beta for Tier 4)
  • Auto-compaction: Triggers around 80% usage (~160k for 200k window)

When to Use

Use context-engineering when:

  1. Approaching Context Limits - Context usage >60-70%, need to optimize before hitting limits
  2. Building Large Skills - Creating skills with extensive documentation, need efficient loading strategies
  3. Token Cost Management - Reducing API costs through optimization
  4. Multi-Turn Conversations - Maintaining coherence across extended sessions
  5. Skill Design Phase - Planning context-efficient architecture from start
  6. Performance Optimization - Improving response quality and latency
  7. Conversation Quality - Preventing drift and maintaining focus
  8. MCP Integration - Managing context from Model Context Protocol servers
  9. Large Data Handling - Working with extensive datasets or outputs

Prerequisites

  • Understanding of Claude context windows
  • Access to context monitoring (Claude Code /context command)
  • Familiarity with progressive disclosure pattern
  • Skills under development or optimization

Operations

Operation 1: Monitor Context Usage

Purpose: Track token consumption, identify context-heavy elements, and detect optimization opportunities

When to Use This Operation:

  • Beginning of optimization effort (baseline measurement)
  • During development (continuous monitoring)
  • When approaching context limits (>60-70% usage)
  • Investigating performance issues

Process:

  1. Check Current Context Usage

    Use /context command in Claude Code to view:
    - Total tokens used
    - Percentage of context window
    - Files loaded
    - Recent tool calls
    
  2. Identify Heavy Consumers

    • Large files loaded (>5,000 tokens each)
    • Extensive conversation history
    • Many tool call results
    • Large CLAUDE.md files
  3. Analyze Usage Patterns

    • Which files are loaded but rarely referenced?
    • Are all tool results still relevant?
    • Is conversation history necessary?
    • Are there duplicate or redundant contexts?
  4. Document Baseline

    • Record current token usage
    • Note context-heavy elements
    • Identify optimization targets
  5. Set Optimization Goals

    • Target token reduction (e.g., reduce by 20%)
    • Performance improvement targets
    • Quality maintenance requirements

Validation Checklist:

  • Current context usage measured (tokens and percentage)
  • Context-heavy elements identified (files, history, tool results)
  • Baseline documented for comparison
  • Optimization targets set
  • High-impact optimization opportunities noted

Outputs:

  • Current context usage metrics
  • List of context-heavy elements
  • Baseline measurement
  • Optimization targets
  • Priority optimization opportunities

Time Estimate: 10-15 minutes

Example:

Context Usage Analysis
======================
Current Usage: 145,000 tokens (72% of 200k window)

Heavy Consumers:
1. CLAUDE.md: 25,000 tokens (17%)
2. Large skill files: 40,000 tokens (28%)
   - planning-architect/SKILL.md: 15,000 tokens
   - development-workflow/common-patterns.md: 12,000 tokens
   - review-multi/scoring-rubric.md: 8,000 tokens
3. Conversation history: 30,000 tokens (21%)
4. Tool call results: 20,000 tokens (14%)

Optimization Opportunities:
- Split large CLAUDE.md (25k → 10k target)
- Use references/ loading instead of full files (40k → 15k)
- Clear old tool results (20k → 5k)

Target: Reduce to ~100k tokens (50% of window, 31% reduction)

Operation 2: Reduce Context Load

Purpose: Remove stale content, minimize loaded files, and reduce token consumption

When to Use This Operation:

  • Context usage >70% (approaching limits)
  • Performance degradation noticed
  • Before major operations (clear space)
  • Periodic maintenance (every few hours)

Process:

  1. Remove Stale Tool Results

    • Identify old tool call results no longer needed
    • Tool results from exploratory work
    • Superseded information
  2. Minimize File Loading

    • Only load files actively needed
    • Use Grep instead of Read for searching
    • Load specific sections, not entire files
    • Unload files when done with them
  3. Optimize Conversation History

    • Context editing auto-clears stale content
    • Summarize long conversations if needed
    • Start fresh session for new major tasks
  4. Reduce CLAUDE.md Size

    • Keep only essential long-term instructions
    • Move project-specific details to separate files
    • Use CLAUDE.local.md for temporary preferences
    • Target: <5,000 tokens for CLAUDE.md
  5. Apply Progressive Loading

    • Load overview/index files first
    • Load detailed references only when needed
    • Use skill references/ on-demand loading

Validation Checklist:

  • Stale tool results cleared
  • Only necessary files loaded
  • CLAUDE.md size optimized (<5,000 tokens if possible)
  • Progressive loading applied where relevant
  • Context usage reduced measurably

Outputs:

  • Reduced token count
  • Cleaner context window
  • List of removed/optimized elements
  • New context usage measurement

Time Estimate: 15-30 minutes

Example Reduction:

Before Optimization: 145,000 tokens (72%)

Actions Taken:
1. Cleared 50 old tool results: -15,000 tokens
2. Unloaded 3 large files no longer needed: -18,000 tokens
3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens
4. Used references/ loading instead of full files: -25,000 tokens

After Optimization: 75,000 tokens (37%)

Reduction: 70,000 tokens (48% reduction)
Quality Impact: None - relevant context maintained

Operation 3: Optimize Skill Design

Purpose: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture

When to Use This Operation:

  • Planning new skills (design for efficiency from start)
  • Refactoring existing skills (improve context efficiency)
  • Building large/complex skills (manage context proactively)
  • Creating skill ecosystems (coordinate context usage)

Process:

  1. Apply Progressive Disclosure

    • SKILL.md: Overview + essentials only (<1,200 lines, ~3,000-5,000 tokens)
    • references/: Detailed guides loaded on-demand (300-600 lines each, ~1,000-2,000 tokens)
    • scripts/: Automation loaded when needed

    Token Impact: 70-80% reduction vs monolithic (5k vs 20k+ tokens)

  2. Design for Lazy Loading

    • Separate content into focused reference files
    • Each reference file covers one topic
    • Load specific reference, not entire skill
    • Example: Load references/structure-review-guide.md not entire review-multi
  3. Optimize File Sizes

    • SKILL.md target: 800-1,200 lines (2,500-4,000 tokens)
    • Reference files: 300-600 lines (1,000-2,000 tokens each)
    • Keep files focused and concise
  4. Use Token-Efficient Formats

    • Tables instead of prose (higher information density)
    • Lists instead of paragraphs (more scannable)
    • Code blocks for examples (clear and concise)
    • Quick Reference sections (high-density lookup)
  5. Consider Context Budget

    • Estimate token usage for skill
    • Simple skill: 3,000-8,000 tokens total
    • Medium skill: 10,000-20,000 tokens total
    • Complex skill: 25,000-40,000 tokens total
    • Design within budget

Validation Checklist:

  • Progressive disclosure applied (SKILL.md + references/)
  • SKILL.md <1,200 lines (~4,000 tokens or less)
  • Reference files 300-600 lines each
  • Files focused on single topics (lazy loadable)
  • Token-efficient formats used (tables, lists, code blocks)
  • Estimated total tokens within budget
  • Skill can be partially loaded (references/ on-demand)

Outputs:

  • Context-efficient skill design
  • Token budget estimate
  • Progressive disclosure plan
  • Reference file organization

Time Estimate: 20-40 minutes (during planning phase)

Example:

Skill Design: api-integration

Token Budget Analysis:
- SKILL.md: 900 lines → ~3,000 tokens
- references/ (3 files):
  - api-guide.md: 400 lines → ~1,300 tokens
  - auth-patterns.md: 350 lines → ~1,200 tokens
  - examples.md: 300 lines → ~1,000 tokens
- README.md: 300 lines → ~1,000 tokens

Total if all loaded: ~7,500 tokens
Typical usage: SKILL.md only → 3,000 tokens (60% savings)
With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings)

Progressive Disclosure Impact:
- Monolithic (all in SKILL.md): ~7,500 tokens always loaded
- Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical
- Token Savings: 40-60% depending on usage

Design: ✅ Context-efficient with progressive disclosure

Operation 4: Separate Planning from Execution

Purpose: Keep execution context clean by separating exploratory planning from focused implementation

When to Use This Operation:

  • Starting complex development work
  • Context getting cluttered with exploration
  • Need clean context for implementation
  • Before critical/focused work

Process:

  1. Planning Phase (Separate Session)

    • Broad codebase exploration
    • Research and pattern discovery
    • Architecture decisions
    • Task breakdown
    • Output: Plan documents, task lists

    Characteristics: High context usage, exploratory, broad

  2. Execution Phase (Fresh Session)

    • Load plan documents (not exploration history)
    • Focused implementation
    • Specific file operations
    • Minimal context bloat

    Characteristics: Clean context, focused, efficient

  3. Session Transition

    • End planning session when plan complete
    • Save planning artifacts (plans, task lists, decisions)
    • Start new session for execution
    • Load only: plan docs, CLAUDE.md, immediate dependencies
  4. Maintain Clean Execution Context

    • Don't re-explore during execution
    • Follow plan, don't re-research
    • Load files as needed, unload when done
    • Keep focus on implementation

Validation Checklist:

  • Planning and execution in separate sessions (when appropriate)
  • Planning artifacts saved and documented
  • Execution session starts with clean context
  • Only plan docs and essentials loaded for execution
  • No re-exploration during execution
  • Context stays focused on current task

Outputs:

  • Clean execution context
  • Focused implementation
  • Reduced context bloat
  • Better performance and quality

Time Estimate: Planning decision (0-5 min), session management (as needed)

Example:

Planning Session (Context: 150k tokens, 75% usage):
- Explored 20 files for research
- Analyzed patterns across codebase
- Made architecture decisions
- Created detailed plan
- Output: skill-plan.md (comprehensive)

[End session, save plan]

Execution Session (Context: 30k tokens, 15% usage):
- Load: CLAUDE.md + skill-plan.md + development-workflow
- Context: Clean, focused, 30k tokens
- Implementation: Follow plan, build skill
- Load references as needed (not all at once)

Result: 80% context reduction (150k → 30k)
Quality: Higher (clean context, focused work)

Research Finding: "Separating planning from execution keeps implementation context clean" - confirmed by 2025 best practices


Operation 5: File-Based Optimization

Purpose: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings

When to Use This Operation:

  • Handling large data sets (>5,000 tokens)
  • Processing extensive outputs (logs, reports)
  • Working with large MCP responses
  • Managing generated content

Process:

  1. Identify Large Data

    • Data >5,000 tokens (typically >10,000 characters)
    • Repeated reference to same large content
    • Extensive generated outputs
    • Large MCP server responses
  2. Externalize to Files

    # Save large data to temp file
    echo "large data here" > /tmp/large-data.txt
    

    Instead of keeping in conversation context

  3. Reference File Instead of Content

    Large dataset saved to: /tmp/analysis-data.json (50,000 tokens)
    
    To analyze: Read /tmp/analysis-data.json when needed
    

    Token Impact: 50,000 tokens → ~500 tokens (95% reduction)

  4. Load On-Demand

    • Read file only when specific analysis needed
    • Process in chunks if necessary
    • Don't keep full content in context
  5. Clean Up Temp Files

    • Remove temp files when no longer needed
    • Don't accumulate unused files

Validation Checklist:

  • Large data identified (>5,000 tokens)
  • Data externalized to files
  • File paths documented (reference in conversation)
  • On-demand loading used (not full content in context)
  • Token savings measured (before/after)
  • Quality maintained (can access data when needed)
  • Temp files cleaned up when done

Outputs:

  • Externalized data files
  • File path references
  • Significant token savings (often 90-95%)
  • Maintained data accessibility

Time Estimate: 10-20 minutes (setup and management)

Example:

Scenario: Analyzing large log file (100,000 tokens)

Before Optimization:
- Full log in conversation context: 100,000 tokens
- Context usage: 50% just for log data

After Optimization:
1. Save log to /tmp/app-log.txt
2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)"
3. Read specific sections when needed:
   - Read first 50 lines for overview
   - Grep for errors
   - Read relevant sections on-demand

Token Usage:
- Before: 100,000 tokens in context
- After: ~500 tokens (file reference) + ~2,000 tokens (specific reads)
- Savings: 97,500 tokens (97.5% reduction)

Quality: Maintained - can still analyze log on-demand
Access: Full log available when needed

Research Finding: "File-based approach achieves 95% token savings" - proven in 2025 optimization studies


Best Practices

1. Quality Over Quantity

Practice: Focus on relevant, high-quality context rather than loading everything

Rationale: Every piece should be current, accurate, and directly relevant to task

Application: Before loading file, ask: "Do I need this right now for current task?"

2. Progressive Disclosure Always

Practice: Design all skills with SKILL.md + references/ pattern

Rationale: 70-80% token reduction vs monolithic design

Application: SKILL.md <1,200 lines, details in references/ loaded on-demand

3. Monitor Regularly

Practice: Check context usage periodically, especially in long sessions

Rationale: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift

Application: Check /context every 30-60 minutes in active development

4. File-Based for Large Data

Practice: Externalize data >5,000 tokens to files

Rationale: 95% token savings while maintaining accessibility

Application: Save to /tmp/, reference file path, load on-demand

5. Separate Planning from Execution

Practice: Plan in one session, execute in clean session with plan artifacts

Rationale: Keeps execution context focused, prevents exploratory noise

Application: When planning >1 hour, start fresh session for implementation

6. Clean Context Before Critical Work

Practice: Reduce context load before important/complex operations

Rationale: Clean context improves quality and performance

Application: Before complex implementation, clear unnecessary context

7. Use Appropriate Tools

Practice: Choose tools that minimize context usage

Rationale: Some tools add more context than others

Application:

  • Grep instead of Read for searching (doesn't load full file)
  • Glob for finding files (doesn't load content)
  • Task agents for exploration (separate context)

8. Optimize CLAUDE.md

Practice: Keep CLAUDE.md concise (<5,000 tokens), split if needed

Rationale: CLAUDE.md loaded every session, large files waste context

Application:

  • Essential standards in CLAUDE.md (project level)
  • Temporary preferences in CLAUDE.local.md
  • Specific domain knowledge in separate files loaded as needed

Context Budgets for Skills

Simple Skills (Total: 5,000-10,000 tokens)

Structure:

  • SKILL.md only or SKILL.md + 1-2 small references
  • No scripts or minimal automation
  • ~400-800 lines total

Token Breakdown:

  • SKILL.md: 3,000-4,000 tokens
  • References (if any): 1,000-2,000 tokens each
  • README: 1,000 tokens

Example: format-validator, simple helpers


Medium Skills (Total: 15,000-30,000 tokens)

Structure:

  • SKILL.md + 3-5 references + scripts
  • ~1,500-3,000 lines total

Token Breakdown:

  • SKILL.md: 3,000-5,000 tokens
  • References: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
  • Scripts: 2-3 files × 1,000 tokens = 2,000-3,000 tokens
  • README: 1,000-1,500 tokens

Progressive Loading: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical

Example: prompt-builder, skill-researcher


Complex Skills (Total: 40,000-60,000 tokens)

Structure:

  • SKILL.md + 7-10 references + 4+ scripts
  • ~4,000-7,000 lines total

Token Breakdown:

  • SKILL.md: 4,000-6,000 tokens
  • References: 7-10 files × 2,000 tokens = 14,000-20,000 tokens
  • Scripts: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
  • README: 1,500-2,000 tokens

Progressive Loading: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical

Example: review-multi, testing-validator

Key: Even complex skills only load 6-10k tokens typically (not full 40-60k)


Common Mistakes

Mistake 1: Loading Everything Upfront

Symptom: Context quickly fills with all skill files

Cause: Reading all files instead of progressive loading

Fix: Load SKILL.md first, load references/ only when needed for specific operations

Prevention: Follow progressive disclosure pattern

Mistake 2: Not Monitoring Context

Symptom: Unexpected context overflow, performance degradation

Cause: No visibility into token usage

Fix: Check /context regularly, monitor usage patterns

Prevention: Check context every 30-60 min in active sessions

Mistake 3: Large Monolithic CLAUDE.md

Symptom: Every session starts with 20k-50k tokens used

Cause: Putting everything in CLAUDE.md

Fix: Split CLAUDE.md - essentials only, separate files for detailed knowledge

Prevention: Keep CLAUDE.md <5,000 tokens, use multiple files

Mistake 4: Keeping Stale Tool Results

Symptom: Context bloated with old exploratory results

Cause: Not clearing old tool calls

Fix: Context editing auto-clears, but can manually manage by starting fresh sessions

Prevention: Fresh session for major transitions (planning → execution)

Mistake 5: Not Using File-Based Strategies

Symptom: Large data sets consuming 30-50% of context

Cause: Keeping large outputs in conversation

Fix: Save to /tmp/ files, reference file path, load on-demand

Prevention: Any data >5,000 tokens → externalize to file

Mistake 6: Ignoring Progressive Disclosure

Symptom: Skills with 2,000+ line SKILL.md files

Cause: Not using references/ for detailed content

Fix: Extract detailed content to references/, keep SKILL.md as overview

Prevention: Design with progressive disclosure from start (use planning-architect)


Quick Reference

Context Window Limits (2025)

Model Standard Beta (Tier 4) Auto-Compact
Sonnet 4/4.5 200k tokens 500k-1M tokens ~80% (~160k for 200k)

Token Savings Strategies

Strategy Token Savings Application
Progressive Disclosure 70-80% SKILL.md + references/ vs monolithic
File-Based Externalization 95% Large data >5k tokens to /tmp/ files
Context Editing 29-39% Auto-clears stale content
Lazy Loading 60-70% Load references on-demand vs all upfront
Optimized CLAUDE.md Variable Keep <5k tokens vs 20-50k bloat

Skill Token Budgets

Skill Complexity Total Tokens Typical Load Progressive Load
Simple 5k-10k 3k-4k SKILL.md only
Medium 15k-30k 4k-6k SKILL.md + 1 reference
Complex 40k-60k 6k-10k SKILL.md + 2 references

Context Usage Guidelines

Usage % Status Action
<50% ✅ Healthy Normal operation
50-70% ⚠️ Monitor Check periodically, plan optimization
70-80% ⚠️ Optimize Reduce context load soon
>80% ❌ Critical Immediate optimization needed (auto-compact triggers)

Optimization Decision Tree

Is context >70%?
├─ Yes → Reduce immediately (Operation 2)
│   ├─ Clear stale tool results
│   ├─ Unload unnecessary files
│   └─ Start fresh session if needed
└─ No → Preventive optimization
    ├─ Is data >5k tokens? → File-based (Operation 5)
    ├─ Building skills? → Progressive disclosure (Operation 3)
    └─ Long session? → Consider planning/execution split (Operation 4)

Quick Optimization Actions

Immediate (When context >80%):

1. Check usage: /context command
2. Clear old tool results (context editing helps)
3. Start fresh session with essentials only
4. Load plan docs, not exploration history

Preventive (During development):

1. Design skills with progressive disclosure
2. Use file-based for large data (>5k tokens)
3. Monitor context every 30-60 min
4. Separate planning from execution (for complex work)

Common Commands

# Monitor context usage
/context

# Read specific lines (not full file)
Read file_path --limit 50

# Search without loading (uses Grep, more efficient)
Grep "pattern" path/

# Find files without loading content
Glob "*.py" path/

# Externalize large data
Bash: command > /tmp/output.txt
# Then reference: "See /tmp/output.txt for results"

Token Estimation

Quick Estimation:

  • 1 line of code/text ≈ 3-4 tokens (average)
  • 1,000 lines ≈ 3,000-4,000 tokens
  • Dense prose: 3-3.5 tokens/line
  • Code with comments: 2.5-3 tokens/line
  • Tables/lists: 2-2.5 tokens/line (more efficient)

For More Information

  • Context monitoring: references/context-monitoring-guide.md
  • Reduction strategies: references/reduction-strategies.md
  • Optimization patterns: references/optimization-patterns.md
  • Analysis script: scripts/analyze-context-usage.py

context-engineering helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.

Weekly Installs
1
Installed on
claude-code1