Context Budget Management

Strategies for managing limited context windows efficiently using progressive disclosure and tiered loading.

When to Use

Conversation approaching context limits
Working with large codebases or documentation
Multi-turn interactions with accumulating context
Need to balance breadth vs. depth of information
Optimizing API costs (tokens = money)

When NOT to Use

Short, single-turn interactions
When full context fits comfortably
Simple Q&A without state accumulation

Quick Start

Assess current context usage and remaining budget
Tier information by relevance (core → domain → task → optional)
Summarize or evict low-priority context
Load new information only when needed
Monitor and rebalance as conversation evolves

Core Principle: Progressive Disclosure

Load information only when needed, in order of relevance.

┌─────────────────────────────────────────────────────────┐
│  ALWAYS LOADED (~500 tokens)                            │
│  • Core identity/constraints                            │
│  • Critical safety rules                                │
│  • Navigation hints to deeper content                   │
└─────────────────────────────────────────────────────────┘
         ↓ Load on demand
┌─────────────────────────────────────────────────────────┐
│  DOMAIN CONTEXT (~2K tokens)                            │
│  • Project-specific rules                               │
│  • Architecture patterns                                │
│  • Key file locations                                   │
└─────────────────────────────────────────────────────────┘
         ↓ Load when task requires
┌─────────────────────────────────────────────────────────┐
│  TASK CONTEXT (~4K tokens)                              │
│  • Specific file contents                               │
│  • API documentation                                    │
│  • Implementation details                               │
└─────────────────────────────────────────────────────────┘
         ↓ Load only for deep work
┌─────────────────────────────────────────────────────────┐
│  FULL CONTEXT (remaining budget)                        │
│  • Complete files                                       │
│  • Full conversation history                            │
│  • Extensive examples                                   │
└─────────────────────────────────────────────────────────┘

Context Tiers

Tier 1: Core (~500 tokens)

Always present, never evicted.

System identity and constraints
Critical safety/security rules
Pointers to deeper content
Session state (current task, branch, etc.)

Tier 2: Domain (~2K tokens)

Loaded when entering a domain, evicted when leaving.

Project CLAUDE.md / README
Architecture decisions
Naming conventions
Key abstractions and patterns

Tier 3: Task (~4K tokens)

Loaded for specific tasks, evicted when task changes.

Relevant file contents
API signatures and docs
Related test files
Error messages and stack traces

Tier 4: Full (remaining)

Loaded only when deep work requires it.

Complete file contents
Extensive conversation history
Large documentation sets
Full codebase context (via repomix bundles)

Strategies

1. Observation Masking

Remove low-value observations from context while preserving key information.

When to use: Repeated similar operations (file listings, search results)

How:

Keep only unique/relevant results
Summarize patterns instead of listing all
Drop intermediate states, keep final

BEFORE (high token cost):
- Listed 50 files in src/
- Listed 30 files in tests/
- Listed 20 files in docs/

AFTER (low token cost):
- Project structure: src/ (50 files), tests/ (30), docs/ (20)
- Key patterns: components in src/components/, tests mirror src/

2. LLM Summarization

Use summarization to compress context while preserving meaning.

When to use: Long conversation history, large documents

How:

Summarize completed discussion threads
Extract key decisions and action items
Compress verbose outputs to essentials

BEFORE: 2000 tokens of debugging discussion
AFTER: "Resolved: Auth timeout caused by missing JWT refresh. Fixed in auth.ts:45."

3. Lazy Loading

Don't load information until it's needed.

When to use: Large reference materials, optional context

How:

Load file contents only when editing
Fetch documentation only when implementing
Defer examples until ambiguity arises

4. Tiered Eviction

Evict lower-priority context when budget is tight.

Eviction order (first to evict → last):

Intermediate outputs (superseded by final)
Exploration artifacts (search results, file listings)
Completed task context
Domain context (when switching domains)
Core context (never evict)

5. Context Checkpointing

Save and restore context states for multi-branch work.

When to use: Working on multiple features, context switching

How:

Summarize current state before switching
Store summaries in persistent notes (git branches, files)
Restore with targeted loading when resuming

Budget Estimation

Token Approximations

Content Type	Tokens/Unit
English text	~0.75 tokens/word
Code	~0.5 tokens/character
JSON/YAML	~1 token/4 characters
Markdown	~0.8 tokens/word

Model Context Limits (approximate)

Model	Context	Practical Budget
Claude Sonnet	200K	~150K usable
Claude Opus	200K	~150K usable
GPT-4	128K	~100K usable
GPT-4o	128K	~100K usable

Practical budget = Total - system prompt - response buffer

Procedure

Step 1: Assess Current State

Check context usage:

How many turns in conversation?
How much file content loaded?
What's the current task focus?

Checkpoint: Identify if approaching limits.

Step 2: Classify Loaded Information

Sort into tiers:

What's core (always needed)?
What's domain (project-specific)?
What's task (current work)?
What's optional (can be evicted)?

Step 3: Apply Strategies

Choose appropriate strategy:

Near limit? → Evict lowest tier, summarize history
Switching tasks? → Checkpoint current, load new task context
Need more depth? → Load specific files on demand
Exploration phase? → Use observation masking

Step 4: Monitor and Rebalance

After each significant action:

Did we load new context?
Can we evict completed task context?
Should we summarize recent discussion?

Failure Modes & Recovery

Issue	Recovery
Hit context limit mid-task	Summarize history, evict exploration artifacts
Lost important context	Re-read key files, check conversation summary
Slow responses	Reduce loaded context, use lazy loading
Inconsistent behavior	Reload core + domain tiers explicitly

Integration with Other Skills

/collab

Use git branches for context checkpointing
Branch summaries provide restoration points

/repomix

Generate compressed codebase bundles
Load bundles for broad context, files for depth

/skill-design

Skills use progressive disclosure by design
Apply same principles to conversation management

Security & Permissions

Required tools: None (advisory skill)
Confirmations: None (no destructive actions)
Trust model: Context management is metadata-level, no external data

References

Metadata

author: Christian Kusmanow / Claude
version: 1.0.0
last_updated: 2026-01-23
source:
  - Platform Agent Architecture (60_Science/)
  - JetBrains Context Management Research (2025)
  - Agent Skills progressive disclosure pattern

context-budget

Context Budget Management

When to Use

When NOT to Use

Quick Start

Core Principle: Progressive Disclosure

Context Tiers

Tier 1: Core (~500 tokens)

Tier 2: Domain (~2K tokens)

Tier 3: Task (~4K tokens)

Tier 4: Full (remaining)

Strategies

1. Observation Masking

2. LLM Summarization

3. Lazy Loading

4. Tiered Eviction

5. Context Checkpointing

Budget Estimation

Token Approximations

Model Context Limits (approximate)

Procedure

Step 1: Assess Current State

Step 2: Classify Loaded Information

Step 3: Apply Strategies

Step 4: Monitor and Rebalance

Failure Modes & Recovery

Integration with Other Skills

/collab

/repomix

/skill-design

Security & Permissions

References

Metadata