performance-optimization
Table of Contents
- Overview
- Quick Start
- Token Budget Model
- Progressive Loading Patterns
- Optimization Workflow
- Quality Checks
Overview
Plugin ecosystems face a fundamental constraint: the skill description budget (2% of context window, ~16k characters at 200k tokens). Every skill loaded into context reduces the space available for actual work. This skill provides patterns to minimize token footprint while preserving full capability.
Core Principles
- Metadata-first discovery - Claude scans ~100 tokens of frontmatter to decide relevance before loading full content
- Progressive disclosure - Essential content loads immediately; advanced content lives in modules loaded on-demand
- Token budgeting - Track and enforce per-skill token limits aligned with the ecosystem budget
- Context-aware delivery - Load depth matches task complexity
Quick Start
# Estimate tokens for a skill
python plugins/abstract/scripts/validate_budget.py
# Check a single skill's token footprint
Skill(abstract:estimate-tokens) --path plugins/my-plugin/skills/my-skill/SKILL.md
# Audit full ecosystem budget
Skill(abstract:skills-eval) --focus token-efficiency
Token Budget Model
Budget Allocation (Claude Code v2.1.32+)
| Context Window | Budget (2%) | Per-Skill Target | Max Skills |
|---|---|---|---|
| 200k tokens | ~16,000 chars | 300-500 chars | ~40 |
| 1M tokens | ~80,000 chars | 300-500 chars | ~200 |
The SLASH_COMMAND_TOOL_CHAR_BUDGET env var overrides the default. The ecosystem validator uses 17,000 to provide growth headroom.
Per-Skill Targets
| Skill Size | Token Range | Strategy |
|---|---|---|
| Minimal | <300 tokens | Single SKILL.md, no modules |
| Standard | 300-800 tokens | SKILL.md + 1-2 modules |
| Large | 800-1500 tokens | Progressive loading required |
| Oversize | >1500 tokens | Split into separate skills |
Progressive Loading Patterns
Pattern 1: Module Decomposition
Keep SKILL.md lean with frontmatter + overview + quick start. Move implementation details to modules/:
my-skill/
SKILL.md # ~300 tokens: frontmatter, overview, quick start
modules/
implementation.md # Loaded when Claude needs detailed steps
examples.md # Loaded when user asks for examples
advanced.md # Loaded for complex scenarios
Pattern 2: Tiered Content
Structure SKILL.md with essential content first, deeper content gated behind clear section boundaries:
## Quick Start <!-- Always loaded, ~100 tokens -->
## Core Workflow <!-- Loaded for standard use, ~200 tokens -->
## Advanced Patterns <!-- Loaded on-demand via modules -->
Pattern 3: Conditional Module Loading
Use progressive_loading: true in frontmatter. Claude loads the base SKILL.md first, then fetches modules only when the task requires deeper guidance. This achieves 40-60% context reduction for complex skills.
Optimization Workflow
Step 1: Measure
# Get current token estimate
Skill(abstract:estimate-tokens) --path path/to/SKILL.md
Step 2: Identify Reduction Targets
- Move examples and edge cases to modules
- Compress repetitive patterns into tables
- Remove content duplicated from dependencies
- Replace verbose explanations with concise rules
Step 3: Restructure
- Extract sections >200 tokens into
modules/ - Ensure SKILL.md frontmatter description stays under 500 characters
- Add
progressive_loading: trueto frontmatter - List modules in frontmatter
modules:array
Step 4: Validate
# Re-measure and verify reduction
python plugins/abstract/scripts/validate_budget.py
# Target: 50%+ reduction from original
Quality Checks
Before finalizing optimization:
- SKILL.md frontmatter description is under 500 characters
- Quick Start section provides enough info for basic use
- All modules are listed in frontmatter
modules:array -
estimated_tokensin frontmatter reflects actual measured value -
progressive_loading: trueis set when modules exist - No functionality lost - advanced content accessible via modules
- Token reduction target met (50%+ for large skills)
Resources
- Modular Skills - Architecture patterns for skill decomposition
- ADR 0004: Skill Description Budget - The 2% budget rule scales with context window size (200k → ~16k chars, 1M → ~80k chars). Budget is split across all loaded skills.
SLASH_COMMAND_TOOL_CHAR_BUDGETenv var overrides the default. - Context Optimization (
Skill(conserve:context-optimization)) - MECW principles: measure context before acting, target 50% utilization, use progressive loading for large skills, prefer native tools over bash equivalents