Overview

Plugin ecosystems face a fundamental constraint: the skill description budget (2% of context window, ~16k characters at 200k tokens). Every skill loaded into context reduces the space available for actual work. This skill provides patterns to minimize token footprint while preserving full capability.

Core Principles

Metadata-first discovery - Claude scans ~100 tokens of frontmatter to decide relevance before loading full content
Progressive disclosure - Essential content loads immediately; advanced content lives in modules loaded on-demand
Token budgeting - Track and enforce per-skill token limits aligned with the ecosystem budget
Context-aware delivery - Load depth matches task complexity

Quick Start

# Estimate tokens for a skill
python plugins/abstract/scripts/validate_budget.py

# Check a single skill's token footprint
Skill(abstract:estimate-tokens) --path plugins/my-plugin/skills/my-skill/SKILL.md

# Audit full ecosystem budget
Skill(abstract:skills-eval) --focus token-efficiency

Token Budget Model

Budget Allocation (Claude Code v2.1.32+)

Context Window	Budget (2%)	Per-Skill Target	Max Skills
200k tokens	~16,000 chars	300-500 chars	~40
1M tokens	~80,000 chars	300-500 chars	~200

The SLASH_COMMAND_TOOL_CHAR_BUDGET env var overrides the default. The ecosystem validator uses 17,000 to provide growth headroom.

Per-Skill Targets

Skill Size	Token Range	Strategy
Minimal	<300 tokens	Single SKILL.md, no modules
Standard	300-800 tokens	SKILL.md + 1-2 modules
Large	800-1500 tokens	Progressive loading required
Oversize	>1500 tokens	Split into separate skills

Progressive Loading Patterns

Pattern 1: Module Decomposition

Keep SKILL.md lean with frontmatter + overview + quick start. Move implementation details to modules/:

my-skill/
  SKILL.md              # ~300 tokens: frontmatter, overview, quick start
  modules/
    implementation.md   # Loaded when Claude needs detailed steps
    examples.md         # Loaded when user asks for examples
    advanced.md         # Loaded for complex scenarios

Pattern 2: Tiered Content

Structure SKILL.md with essential content first, deeper content gated behind clear section boundaries:

## Quick Start          <!-- Always loaded, ~100 tokens -->
## Core Workflow        <!-- Loaded for standard use, ~200 tokens -->
## Advanced Patterns    <!-- Loaded on-demand via modules -->

Pattern 3: Conditional Module Loading

Use progressive_loading: true in frontmatter. Claude loads the base SKILL.md first, then fetches modules only when the task requires deeper guidance. This achieves 40-60% context reduction for complex skills.

Optimization Workflow

Step 1: Measure

# Get current token estimate
Skill(abstract:estimate-tokens) --path path/to/SKILL.md

Step 2: Identify Reduction Targets

Move examples and edge cases to modules
Compress repetitive patterns into tables
Remove content duplicated from dependencies
Replace verbose explanations with concise rules

Step 3: Restructure

Extract sections >200 tokens into modules/
Ensure SKILL.md frontmatter description stays under 500 characters
Add progressive_loading: true to frontmatter
List modules in frontmatter modules: array

Step 4: Validate

# Re-measure and verify reduction
python plugins/abstract/scripts/validate_budget.py
# Target: 50%+ reduction from original

Quality Checks

Before finalizing optimization:

SKILL.md frontmatter description is under 500 characters
Quick Start section provides enough info for basic use
All modules are listed in frontmatter modules: array
estimated_tokens in frontmatter reflects actual measured value
progressive_loading: true is set when modules exist
No functionality lost - advanced content accessible via modules
Token reduction target met (50%+ for large skills)

Resources

Modular Skills - Architecture patterns for skill decomposition
ADR 0004: Skill Description Budget - The 2% budget rule scales with context window size (200k → ~16k chars, 1M → ~80k chars). Budget is split across all loaded skills. SLASH_COMMAND_TOOL_CHAR_BUDGET env var overrides the default.
Context Optimization (Skill(conserve:context-optimization)) - MECW principles: measure context before acting, target 50% utilization, use progressive loading for large skills, prefer native tools over bash equivalents

performance-optimization

Table of Contents