reverse-claude
Reverse-Claude: The Self-Configuring Runtime Intelligence Skill
This skill is Claude Code's mirror. It encodes how Claude Code works internally -- its permission cascade, plugin system, agent orchestrator, context compactor, and configuration surface -- extracted from ~13,000 lines of production source. Point this skill at Claude Code itself to understand, configure, and extend it.
When to Use
- Configuring Claude Code — set up permissions, hooks, CLAUDE.md rules, settings
- Building Claude Code-like systems — command safety, plugin lifecycle, agent orchestration
- Managing LLM context — compaction hierarchies, retry strategies, circuit breakers
- Understanding internals — how permissions cascade, how agents spawn, how compaction works
- Extending Claude Code — writing plugins, skills, hooks, MCP integrations
Module 1: Permissioned Invocation & Runtime Safety
Invariant
No command executes without passing through a multi-layered permission cascade that enforces deny > ask > allow > read-only > passthrough priority, with defense-in-depth against shell injection, env-var bypass, wrapper bypass, and parser differentials.
Pattern 1.1: Permission Decision Cascade
The master permission function implements a strict priority chain. Every command goes through this cascade:
1. AST parse (tree-sitter) -> reject if too complex
2. Sandbox auto-allow check -> still respects deny/ask rules
3. Exact match deny/ask/allow rules
4. Classifier-based deny/ask (LLM classifier)
5. Command operator permissions (pipes, redirects)
6. Split into subcommands -> per-subcommand check:
a. Exact match deny -> DENY
b. Exact match ask -> ASK
c. Prefix/wildcard deny -> DENY
d. Prefix/wildcard ask -> ASK
e. Path constraints -> ASK/DENY
f. Exact match allow -> ALLOW
g. Prefix/wildcard allow -> ALLOW
h. Mode-specific (auto mode) -> ALLOW
i. Read-only check -> ALLOW
j. Passthrough -> prompt user
7. Command injection check (regex battery)
8. Classifier async auto-approve (race the user)
Key rule: deny always wins. Even in auto-allow modes, deny rules are checked on both the full command AND each subcommand.
Pseudocode:
function checkPermission(input, context):
// 1. Exact match first (deny/ask/allow)
exact = checkExactMatch(input, rules)
if exact.behavior in ['deny', 'ask']: return exact
// 2. Prefix/wildcard deny/ask (BEFORE allow - prevents bypass)
{denyRules, askRules, allowRules} = matchRulesForInput(input)
if denyRules[0]: return deny
if askRules[0]: return ask
// 3. Path constraints (after deny so explicit rules take precedence)
pathResult = checkPathConstraints(input, cwd)
if pathResult != passthrough: return pathResult
// 4. Allow rules (exact, then prefix/wildcard)
if exact.behavior == 'allow': return exact
if allowRules[0]: return allow
// 5. Mode-specific permission
modeResult = checkPermissionMode(input)
if modeResult != passthrough: return modeResult
// 6. Read-only auto-allow
if isReadOnly(input): return allow
// 7. Passthrough -> prompt user with suggestions
return {behavior: 'passthrough', suggestions: suggestRule(command)}
Pattern 1.2: Asymmetric Env-Var Stripping
This is the most security-critical pattern. Allow and deny rules strip env vars differently:
- Allow rules strip ONLY safe env vars (build targets, logging, locale).
DOCKER_HOST=evil docker psdoes NOT matchallow(docker ps:*). - Deny/ask rules strip ALL env vars via fixed-point iteration.
FOO=bar denied_cmdstill matchesdeny(denied_cmd).
SAFE_ENV_VARS = {GOOS, GOARCH, NODE_ENV, RUST_LOG, LANG, TERM, NO_COLOR, ...}
NEVER_SAFE = {PATH, LD_PRELOAD, DYLD_*, PYTHONPATH, NODE_PATH, HOME, SHELL, BASH_ENV, ...}
function stripForAllowRules(cmd):
return stripOnlySafeEnvVars(cmd) // Conservative
function stripForDenyRules(cmd):
// Fixed-point iteration - strips interleaved wrappers + env vars
seen = Set(cmd)
queue = [cmd]
while queue not empty:
current = queue.pop()
envStripped = stripAllLeadingEnvVars(current)
wrapperStripped = stripSafeWrappers(current) // timeout, nice, nohup, etc.
for variant in [envStripped, wrapperStripped]:
if variant not in seen: queue.push(variant); seen.add(variant)
return all variants // Check deny rules against ALL
Why fixed-point: Commands like nohup FOO=bar timeout 5 evil_cmd require iterative stripping because wrappers and env vars interleave.
Pattern 1.3: Declarative Flag Validation (Read-Only Safety)
Commands are auto-allowed as "read-only" only if every flag is in a whitelist:
type CommandConfig = {
safeFlags: Record<string, 'none' | 'string' | 'number' | 'char'>
regex?: RegExp // Additional validation pattern
dangerCallback?: (raw, args) => boolean // Custom danger check
respectsDoubleDash?: boolean // Default: true
}
ALLOWLIST = {
'git diff': { safeFlags: {'--stat': 'none', '--cached': 'none', ...} },
'ps': { safeFlags: {'-e': 'none', ...}, dangerCallback: blockBsdEFlag },
'date': { safeFlags: {'-u': 'none', ...}, dangerCallback: blockTimeSet },
}
function isCommandSafe(command):
parsed = shellQuoteParse(command)
if hasOperators(parsed): return false // Pipes/redirects handled upstream
config = matchAllowlist(parsed.command)
if not config: return false
// CRITICAL: Reject ANY token containing $
// Prevents: git diff "$Z--output=/tmp/pwned" (parser differential)
// Prevents: rg . "$Z--pre=bash" FILE (RCE via flag injection)
if anyToken(t => t.contains('$')): return false
// Reject brace expansion: {a,b} or {1..5}
if anyToken(t => hasBraceExpansion(t)): return false
return validateFlags(parsed.args, config.safeFlags)
&& (!config.regex || config.regex.test(command))
&& (!config.dangerCallback || !config.dangerCallback(raw, args))
Pattern 1.4: Compound Command Security Gates
MAX_SUBCOMMANDS = 50 // Prevents DoS from exponential splitCommand
RULES:
1. cd + git blocked -> malicious dir could have bare repo hooks
2. Multiple cd blocked -> requires approval
3. Any subcommand denied -> whole compound denied
4. Prefix rules don't match across && boundaries
(cd:* must NOT match "cd /path && python3 evil.py")
Pattern 1.5: Speculative Classifier (Race the User)
// Fire LLM classifier BEFORE permission prompt shows
classifierPromise = startSpeculativeCheck(command)
// Show permission prompt to user
showPrompt(command)
// If classifier returns allow AND user hasn't interacted:
if await classifierPromise == 'allow' && !userHasInteracted():
autoApprove() // Dismiss prompt
else:
discardClassifierResult() // User is already engaged
Pattern 1.6: Command Semantics (Exit Code Interpretation)
Not all non-zero exit codes are errors. The system overrides default interpretation per-command:
COMMAND_EXIT_CODE_OVERRIDES:
grep: exit 1 = "no match found" (not an error)
find: exit 1 = "partial access denied" (partial success)
diff: exit 1 = "files differ" (expected behavior)
test: exit 1 = "condition false" (not an error)
Without these overrides, the model would treat grep pattern file returning nothing as a tool failure and retry or apologize.
Pattern 1.7: Git Operation Detection & Tracking
GIT_OPERATION_PATTERNS:
commit: /git\s+commit/
push: /git\s+push/
cherry-pick: /git\s+cherry-pick/
merge: /git\s+merge/
rebase: /git\s+rebase/
PR creation: /gh\s+pr\s+create/ or /glab\s+mr\s+create/
function detectAndTrackGitOps(command, exitCode):
for pattern in GIT_OPERATION_PATTERNS:
if pattern.test(command) and exitCode == 0:
otelCounter.increment(pattern.name)
if pattern.name == "commit":
kind = command.includes("--amend") ? "amended" : "committed"
trackCommitKind(kind)
Why this matters: Analytics, safety checks (e.g., warn before force-push), and session cost attribution all depend on knowing what git operations were performed.
Edge Cases
- Shell wrappers (
timeout,nice,nohup) stripped with precise flag regexes -- previously[^ \t]+matched$(id)allowing code execution base64on macOS:respectsDoubleDash: falsebecause macOS base64 ignores--- Windows: xargs removed (UNC paths in file contents trigger SMB resolution)
- Git-internal path detection: blocks
mkdir hooks && echo evil > hooks/pre-commit && git status - Single quotes: backslash is NOT escape inside single quotes (HackerOne fix)
grepexit 1 is not an error — command semantics override default interpretation- Git operation tracking fires OTel counters on successful git commands
Anti-Patterns
- Matching deny rules against only the full command string (bypass via env vars or wrappers)
- Trusting shell-quote parse output without checking for
$tokens - Using symmetric stripping for allow and deny rules
- Allowing
sh:*,env:*,bash:*as prefix rules (equivalent to allowing everything) - Treating all non-zero exit codes as errors (grep 1, diff 1, find 1 are valid results)
Module 2: Plugin/Skill Discovery & Lifecycle
Invariant
Plugins are always loaded from markdown files via a deterministic pipeline: walk directories -> parse frontmatter -> validate -> hydrate variables -> register as typed commands. State is persisted in a versioned JSON file with idempotent migration. Updates are background-only and never mutate in-memory session state.
Pattern 2.1: Filesystem-to-Registry Command Loader
function loadPluginCommands():
if cached: return cache
for each enabledPlugin:
markdownFiles = walkDirectory(plugin.commandsPath)
markdownFiles = dedup(markdownFiles) // isDuplicatePath prevents double-load
// SKILL.md takes priority over sibling .md files
if directory has SKILL.md:
markdownFiles = [SKILL.md only]
for file in markdownFiles:
frontmatter = parseYAMLFrontmatter(file)
name = buildNamespacedName(file, plugin) // "pluginName:namespace:commandName"
command = createCommand(name, file, frontmatter, plugin)
registry.add(command)
cache = registry
return registry
function createCommand(name, file, frontmatter, plugin):
return {
type: 'prompt',
name,
description: frontmatter.description ?? extractFromMarkdown(content),
allowedTools: parseFrontmatter(frontmatter['allowed-tools']),
model: frontmatter.model,
getPromptForCommand(args, context):
content = file.content
if isSkill: content = "Base directory: ${dirname}\n\n" + content
content = substituteArguments(content, args)
content = substitutePluginVariables(content, plugin)
content = substituteUserConfig(content, userConfig)
content = content.replace('${CLAUDE_SKILL_DIR}', skillDir)
content = content.replace('${CLAUDE_SESSION_ID}', sessionId)
content = await executeShellCommandsInPrompt(content)
return [{type: 'text', text: content}]
}
Variable substitution chain (order matters):
${CLAUDE_PLUGIN_ROOT}-> plugin install path${user_config.X}-> user settings (sensitive values -> placeholder)${CLAUDE_SKILL_DIR}-> skill's directory${CLAUDE_SESSION_ID}-> current session ID- Shell commands in prompt -> executed and replaced
Pattern 2.2: Versioned State File with Idempotent Migration
// Schema evolution: V1 (flat map) -> V2 (per-scope arrays)
V1: { version: 1, plugins: Record<PluginId, InstalledPlugin> }
V2: { version: 2, plugins: Record<PluginId, PluginInstallationEntry[]> }
PluginScope = 'managed' | 'user' | 'project' | 'local'
function migrateToSingleFile():
if migrationCompleted: return // Idempotent guard
try:
// Attempt atomic rename first
renameSync(v2FilePath, mainFilePath)
migrationCompleted = true; return
catch ENOENT: pass
mainData = readFileSync(mainFilePath)
if mainData.version == 1:
v2Data = migrateV1ToV2(mainData) // All V1 entries -> 'user' scope
writeFileSync(mainFilePath, v2Data)
cleanupLegacyCache(v2Data) // Remove non-versioned flat directories
migrationCompleted = true // Set even on error to prevent retry
Dual-layer state model:
- Disk state: source of truth, updated by background operations
- Memory state: snapshot at startup, never updated by background ops
- Pending updates: detected by comparing disk vs memory install paths
Pattern 2.3: Background Update with Notification Buffer
let callback = null
let pendingNotification = null
function onPluginsUpdated(cb):
callback = cb
// Deliver buffered notification if update finished before listener mounted
if pendingNotification?.length > 0:
cb(pendingNotification)
pendingNotification = null
return () => { callback = null }
function autoUpdateInBackground():
// Fire and forget
void (async () => {
marketplaces = getAutoUpdateEnabled()
await Promise.allSettled(marketplaces.map(refreshMarketplace)) // git pull each
updated = await updatePlugins(marketplaces) // Per-plugin, per-scope
if updated.length > 0:
if callback: callback(updated)
else: pendingNotification = updated // Buffer for late listener
})()
Pattern 2.4: Error-Isolated Subsystem Loading
function initializePlugins():
{enabled, disabled, errors} = await loadAllPlugins()
// Each subsystem wrapped individually -- one failure doesn't block others
try: commands = await getPluginCommands()
catch e: errors.push({source: 'plugin-commands', error: e})
try: agents = await loadPluginAgents()
catch e: errors.push({source: 'plugin-agents', error: e})
try: await loadPluginHooks()
catch e: errors.push({source: 'plugin-hooks', error: e})
// Merge with LSP error preservation (don't clobber existing errors)
setState(prev => {
existingLsp = prev.errors.filter(isLspError)
deduped = existingLsp.filter(e => !newErrors.has(e.key))
return {...prev, errors: [...deduped, ...errors]}
})
Pattern 2.5: Imperative Plugin Registry with Settings Toggle
REGISTRY = new Map()
function registerBuiltinPlugin(definition):
REGISTRY.set(definition.name, definition)
function getBuiltinPlugins():
enabled, disabled = [], []
for (name, def) of REGISTRY:
if def.isAvailable && !def.isAvailable(): continue // Feature gate
isEnabled = settings.enabledPlugins[id] ?? (def.defaultEnabled ?? true)
plugin = buildLoadedPlugin(name, def, isEnabled)
if isEnabled: enabled.push(plugin)
else: disabled.push(plugin)
return {enabled, disabled}
Pattern 2.6: MCP Server Lifecycle & Name Contracts
MCP (Model Context Protocol) servers extend Claude Code with external tools. The lifecycle:
LIFECYCLE:
1. Config declares MCP servers (in settings.json or plugin manifest)
2. On startup, each server connection is initiated (status: pending)
3. Server connects and registers its tools (status: connected)
4. Tools are normalized and added to the tool registry
5. On failure, server marked as failed (tools unavailable)
6. On session end, all connections closed
NAME NORMALIZATION (critical for permission rules):
function normalizeMcpName(rawName):
return rawName
.replace(/[^a-zA-Z0-9]/g, '_') // non-alphanumeric -> underscore
.replace(/_{2,}/g, '_') // dedupe consecutive underscores
.replace(/^_|_$/g, '') // trim edges
PERMISSION RULE FORMAT:
mcp__serverName // all tools from this server
mcp__serverName__toolName // specific tool
mcp__serverName__* // wildcard (all tools)
Why name normalization matters: A server named my-cool.server becomes my_cool_server in tool names and permission rules. Without normalization, permission rules like mcp__my-cool.server__* would silently fail to match.
Edge Cases
- Symlinks accepted as skill directories
- ENOENT gracefully skipped (skill dir might not exist yet)
- Win32 path normalization: backslash -> forward slash for
CLAUDE_SKILL_DIR source: 'bundled'(not'builtin') to stay in analytics pipeline- Delisted plugin enforcement runs on mount before commands load
- Settings divergence guard:
isPluginInstalledchecks BOTH install file AND settings - MCP server names normalized:
my-server.v2becomesmy_server_v2in tool names - MCP servers can require OAuth — auth flow handled at connection time
- Pending MCP servers polled every 500ms up to 30s before agent dispatch
Anti-Patterns
- Loading plugins synchronously at startup (blocks REPL)
- Mutating in-memory state from background update (stale cache bugs)
- Single error handler for all subsystems (one failure kills everything)
- Allowing plugin updates to auto-activate without session restart
- Not normalizing MCP server names (permission rules silently fail to match)
Module 3: Agent Orchestration & Skill Execution
Invariant
Every agent invocation flows through a single validation pipeline (permission check -> agent resolution -> MCP requirement check -> isolation setup -> execution mode routing -> result mapping), regardless of execution mode. Feature flags control schema shape and execution routing at runtime, enabling dead code elimination at build time.
Pattern 3.1: Polymorphic Execution Router
A single tool definition handles four execution modes through one call() entry point:
EXECUTION MODES:
1. Sync foreground -- while-loop consuming iterator, blocking
2. Async background -- fire-and-forget, progress tracked, notified on complete
3. Teammate spawn -- flat roster model (teammates can't spawn teammates)
4. Remote launch -- separate process entirely
function call(input, context):
// 1. Validate + resolve agent type
agent = resolveAgentType(input.subagent_type)
permissions = checkPermissions(input)
// 2. Wait for required MCP servers
if hasPendingRequiredServers:
poll every 500ms up to 30s:
if anyRequired.failed: break early
if noneRequired.pending: break
// 3. Setup isolation (worktree if requested)
worktree = input.isolation == 'worktree' ? createWorktree() : null
// 4. Route to execution mode
if input.run_in_background:
return asyncLaunch(agent, input, worktree)
else:
return syncExecute(agent, input, worktree)
Pattern 3.2: Foreground-to-Background Transition
The core concurrency pattern: a while-loop racing message iteration against a background signal:
function syncExecute(agent, input, worktree):
iterator = runAgent(agent, input)
backgroundSignal = createBackgroundPromise()
messages = []
while true:
result = await Promise.race([
iterator.next().then(r => ({type: 'message', result: r})),
backgroundSignal .then(() => ({type: 'background'}))
])
if result.type == 'background':
// Clean transition: stop foreground, spawn async continuation
void Promise.race([iterator.return(), sleep(1000)]) // 1s cleanup timeout
void continueAsAsync(messages, agent, input)
return {status: 'async_launched', agentId}
if result.result.done: break
messages.push(result.result.value)
return finalize(messages)
Auto-background timer: Agents can auto-background after 120s if feature-flagged.
Pattern 3.3: Conditional Schema Construction (Dead Code Elimination)
// Schema fields gated by feature flags at module load time
inputSchema = lazySchema(() => {
base = z.object({description, prompt, subagent_type?, model?, run_in_background?})
multiAgent = z.object({name?, team_name?, mode?})
full = base.merge(multiAgent).extend({isolation, cwd?})
// Feature gates strip fields -- model never sees removed parameters
if !feature('KAIROS'): full = full.omit({cwd: true})
if isBackgroundDisabled: full = full.omit({run_in_background: true})
return full
})
// Build-time elimination via string literal comparison
if ("external" === 'ant'): // Always false in external builds
// Entire remote agent block eliminated by bundler
Pattern 3.4: Multi-Tier Feature Flag Cache with Override Chain
PRIORITY (highest to lowest):
1. Env overrides -- deterministic, for eval harnesses
2. Config overrides -- runtime, for /config UI
3. In-memory cache -- authoritative after init
4. Disk cache -- survives restarts
5. Default value -- hardcoded fallback
function getFeatureValue(feature, defaultValue):
// 1. Env overrides (CLAUDE_INTERNAL_FC_OVERRIDES JSON)
overrides = getEnvOverrides()
if feature in overrides: return overrides[feature]
// 2. Config overrides (user-set via /config)
configOverrides = getConfigOverrides()
if feature in configOverrides: return configOverrides[feature]
// 3. Feature system disabled? Return default
if !isEnabled(): return defaultValue
// 4. Track exposure (deferred if pre-init)
trackExposure(feature)
// 5. In-memory cache (authoritative after SDK init)
if inMemoryCache.has(feature): return inMemoryCache.get(feature)
// 6. Disk cache (survives process restarts)
cached = diskConfig.cachedFeatures?[feature]
return cached ?? defaultValue
Critical edge cases:
- Empty/malformed payload protection:
{features: {}}must NOT clear caches (prevents total flag blackout) - Auth availability race: client may be created before auth is available, needs destroy+recreate on auth arrival
- Pending exposure dedup: features accessed before init tracked in Set, logged once after init
- Refresh subscriber catch-up: if init completes before subscriber registers, fire on next microtask
- Client replacement guard: check
client !== thisClientbefore AND after async operations - Process handler accumulation: named refs stored so
process.off()works across reinit cycles
Pattern 3.5: Discriminated Union Result Mapping
type Output =
| {status: 'completed', content, totalTokens}
| {status: 'async_launched', agentId, outputFile}
| {status: 'teammate_spawned', ...} // Internal only
| {status: 'remote_launched', taskId} // Internal only
function mapResultToWireFormat(data, toolUseId):
switch data.status:
'teammate_spawned': return teammateFormat(data)
'remote_launched': return remoteFormat(data)
'async_launched': return asyncFormat(data) // Include progress hints
'completed': return completedFormat(data) // Include usage trailer
data satisfies never // Exhaustiveness check at compile time
Pattern 3.6: Cascading Abort Controller
Tool execution uses hierarchical abort controllers for clean cancellation:
HIERARCHY:
parentAbort (session-level, user Ctrl+C)
└── siblingAbort (per-turn, shared across concurrent tools)
└── toolAbort (per-tool, individual cancellation)
RULES:
- Parent abort cascades to all children (session termination)
- Sibling abort cascades to all tools in current turn
(triggered by Bash error in any concurrent tool)
- Tool abort cancels only that tool (timeout or individual failure)
- Combined signals: combinedAbortSignal(parent, child) fires on EITHER
function childAbortController(parentSignal):
child = new AbortController()
parentSignal.addEventListener('abort', () => child.abort(parentSignal.reason))
return child
Pattern 3.7: API Client Lifecycle (Multi-Provider)
PROVIDERS: Anthropic (default), Bedrock, Vertex
function getOrCreateClient():
// Memoized — one client per process
if cached and authUnchanged: return cached
client = new Anthropic({apiKey, baseURL, ...providerConfig})
cached = client
return client
// Critical: client replacement guard
// Auth may arrive after client creation (OAuth dialog pending)
function initializeWithAuth():
clientBeforeAuth = getOrCreateClient()
await authFlow()
if authNowAvailable and !clientCreatedWithAuth:
destroyClient(clientBeforeAuth)
cached = null // Force recreation with auth
newClient = getOrCreateClient()
return newClient
Why multi-provider matters: Bedrock uses SigV4 auth, Vertex uses Google OAuth, direct Anthropic uses API keys. The client abstraction hides this but the auth lifecycle is fundamentally different.
Edge Cases
- Recursive fork guard: prevents re-entry via querySource matching + message-scan fallback
- Teammates cannot spawn teammates (flat roster)
- In-process teammates cannot manage background agents (lifecycle tied to leader)
- Iterator cleanup with 1s timeout prevents blocking if MCP server cleanup hangs
- Partial result recovery: if error occurs but assistant messages exist, finalize with partial results
- One-shot agent optimization: skip agentId hint and usage trailer (~135 chars saved per invocation)
- AbortController hierarchy: parent->sibling->tool, Bash errors cascade to siblings only
- Client replacement: destroy+recreate when auth arrives after initial client creation
- Bedrock/Vertex: different auth flows but same client interface
Anti-Patterns
- Blocking startup on feature flag network fetch (must be non-blocking with disk cache fallback)
- Clearing feature flag caches on empty API response
- Symmetric schema for all build targets (prevents dead code elimination)
- Polling without early-exit on failure (30s wasted if MCP server already failed)
Module 4: Context Compaction & Retry
Invariant
Context is finite. The system must proactively compress history before hitting limits, using a hierarchy of increasingly expensive strategies, while preserving enough working state (recent files, plans, skills, tool pairings) for the model to continue without capability loss. Retry logic must classify errors and never amplify cascading failures.
Pattern 4.1: Compaction Hierarchy (Priority Order)
HIERARCHY (cheapest first):
1. Micro-compact (pre-request) -- clear stale tool results, zero API cost
2. Session memory compact -- use pre-computed running summary, zero summarization cost
3. Full compact -- API call to summarize all messages, most expensive
4. Reactive compact -- emergency fallback on prompt_too_long response
THRESHOLDS:
effectiveWindow = contextWindow - min(maxOutputTokens, 20K)
autoCompactAt = effectiveWindow - 13K buffer
warningAt = autoCompactAt - 20K
blockingAt = effectiveWindow - 3K
POST-COMPACT BUDGETS:
Files: 50K tokens total, 5K per file, max 5 files (most recently accessed)
Skills: 25K tokens total, 5K per skill (truncation keeps head where instructions live)
Pattern 4.2: Micro-Compact (Tiered In-Place Reduction)
Two independent strategies, chosen by cache state:
COMPACTABLE_TOOLS = {FileRead, Bash, Grep, Glob, WebSearch, WebFetch, FileEdit, FileWrite}
function microcompact(messages, querySource):
// Strategy 1: Time-based (cache cold -- gap > 60min)
if timeSinceLastAssistant > threshold:
toolIds = collectCompactableToolIds(messages)
keepSet = toolIds.slice(-max(1, config.keepRecent))
for each toolResult not in keepSet:
toolResult.content = '[Old tool result content cleared]'
resetCachedMCState() // Stale IDs would cause cache_edit errors
return {messages}
// Strategy 2: Cache-editing (cache warm, main thread only)
if supportsAPICacheEdits && isMainThread:
candidates = getCacheDeletionCandidates()
return {messages, cacheEdits: {deletedToolIds: candidates}}
// Strategy 3: No-op
return {messages}
The 60-minute default matches server-side cache TTL -- clearing is guaranteed post-cache-expiry, never forces a miss that wouldn't have happened.
Pattern 4.3: Full Compaction with State Preservation
function compactConversation(messages, options):
// 1. Pre-compact hooks (extensible plugin points)
hookResults = await runPreCompactHooks()
// 2. Strip images/attachments (reduce input before summarization)
stripped = stripImagesFromMessages(messages) // [image] markers
stripped = stripReinjectedAttachments(stripped) // Skill listings re-injected anyway
// 3. Summarize via forked agent (cache sharing) or streaming
for attempt in 0..MAX_PTL_RETRIES:
try:
summary = await summarizeViaForkedAgent(stripped, cacheSharingParams)
break
catch PromptTooLong:
// Truncate oldest API-round groups, retry
stripped = truncateHeadForPTLRetry(stripped, error)
if stripped == null: throw // Can't truncate further
// 4. Post-compact state restoration
fileAttachments = await createPostCompactFileAttachments(messages)
skillAttachments = createSkillAttachmentIfNeeded()
planAttachments = restorePlanIfNeeded()
boundaryMarker = createBoundaryMarker(metadata)
// 5. Post-compact cleanup
runPostCompactCleanup(querySource)
return {boundaryMarker, summary, attachments: [...files, ...skills, ...plans]}
Pattern 4.4: Prompt-Too-Long Retry with Grouped Truncation
function truncateHeadForPTLRetry(messages, ptlResponse):
// Strip own marker from previous retries (prevent stalling)
input = messages[0].isPTLMarker ? messages.slice(1) : messages
groups = groupMessagesByApiRound(input) // Group by assistant message.id
if groups.length < 2: return null // Can't truncate further
tokenGap = parseTokenGapFromError(ptlResponse)
if tokenGap:
// Drop oldest groups until gap covered
dropCount = 0; accumulated = 0
for group in groups:
accumulated += estimateTokens(group); dropCount++
if accumulated >= tokenGap: break
else:
dropCount = max(1, floor(groups.length * 0.2)) // Fallback: 20%
dropCount = min(dropCount, groups.length - 1) // Always keep at least one group
sliced = groups.slice(dropCount).flat()
// Prepend synthetic user marker if first remaining message is assistant
if sliced[0].type == 'assistant':
return [createUserMessage(PTL_RETRY_MARKER), ...sliced]
return sliced
Pattern 4.5: Session Memory Compact (Cursor-Based Preservation)
function sessionMemoryCompact(messages, sessionMemory):
// Find cursor: last summarized message
cursorIndex = messages.findIndex(m => m.id == sessionMemory.lastSummarizedId)
if cursorIndex < 0: return null // Fall back to legacy compact
// Expand backward to meet minimums
keepIndex = cursorIndex + 1
while keepIndex > 0:
tokens = estimateTokens(messages.slice(keepIndex))
textBlocks = countTextBlockMessages(messages.slice(keepIndex))
if tokens >= minTokens && textBlocks >= minTextBlockMessages: break
if tokens >= maxTokens: break // Hard cap
keepIndex--
// CRITICAL: Preserve API invariants
keepIndex = adjustForAPIInvariants(messages, keepIndex)
return {summary: sessionMemory.content, messagesToKeep: messages.slice(keepIndex)}
function adjustForAPIInvariants(messages, startIndex):
// Step 1: Pull in orphaned tool_use blocks
// (tool_result in kept range must have matching tool_use)
toolResultIds = collectToolResultIds(messages.slice(startIndex))
toolUseIds = collectToolUseIds(messages.slice(startIndex))
missing = toolResultIds.filter(id => !toolUseIds.has(id))
for i in reverse(0..startIndex):
if messages[i] has tool_use with missing id:
startIndex = i
// Step 2: Pull in thinking blocks sharing message.id
// (streaming yields separate messages per content block with same id)
keptIds = collectAssistantMessageIds(messages.slice(startIndex))
for i in reverse(0..startIndex):
if messages[i].id in keptIds:
startIndex = i
return startIndex
Pattern 4.6: Auto-Compact with Circuit Breaker
MAX_CONSECUTIVE_FAILURES = 3 // Was wasting ~250K API calls/day at 50+ failures
function autoCompactIfNeeded(messages, tokenCount, trackingState):
// Guards
if trackingState.consecutiveFailures >= MAX_CONSECUTIVE_FAILURES: return
if querySource in ['session_memory', 'compact']: return // Recursion prevention
if isContextCollapseMode: return // Different system handles this
if tokenCount < autoCompactThreshold: return
// Priority: session memory compact > full compact
try:
if sessionMemoryAvailable && !hasCustomInstructions:
await sessionMemoryCompact(messages)
else:
await fullCompact(messages)
trackingState.consecutiveFailures = 0
catch:
trackingState.consecutiveFailures++
Pattern 4.7: Classified Retry with Adaptive Backoff
function withRetry(apiCall, options):
consecutive529s = options.initialConsecutive529Errors ?? 0
for attempt in 1..maxRetries:
try:
return await apiCall()
catch error:
classification = classifyError(error)
switch classification:
'fast_mode_429_short': // < 20s retry-after
await sleep(retryAfterMs)
continue with fast mode
'fast_mode_429_long': // Unknown/long retry-after
enterCooldown(min 10min, default 30min)
disable fast mode; continue
'non_foreground_529': // Background agent hit capacity
bail immediately // Never amplify cascading failures
'consecutive_529':
consecutive529s++
if consecutive529s >= 3: try model fallback or surface error
'max_tokens_overflow':
newMax = parseAvailableContext(error) - 1K_safety_buffer
if newMax >= thinkingBudget + 1: retry with newMax
'auth_error':
refreshCredentials(); getNewClient(); retry
'persistent_mode': // Unattended operation
retry indefinitely, max backoff 5min
reset cap 6hr, heartbeat every 30s
function getRetryDelay(attempt, retryAfterHeader, maxDelay = 32000):
if retryAfterHeader: return parseSeconds(retryAfterHeader) * 1000
baseDelay = min(500 * 2^(attempt-1), maxDelay)
jitter = random() * 0.25 * baseDelay
return baseDelay + jitter
Pattern 4.8: Scoped Post-Compact Cache Invalidation
function runPostCompactCleanup(querySource):
isMainThread = querySource in ['repl_main_thread*', 'sdk', undefined]
// Always reset (any thread)
resetMicrocompactState()
clearSystemPromptSections()
clearClassifierApprovals()
clearSpeculativeChecks()
sweepFileContentCache()
clearSessionMessagesCache()
// Main thread only (sub-agents share module-level state)
if isMainThread:
resetContextCollapse()
getUserContext.cache.clear() // Outer memoize
resetMemoryFilesCache() // Inner cache
// Intentionally NOT cleared: skill content, skill names
// (skills are re-injected post-compact anyway)
Pattern 4.9: Thinking Block Budget Accounting
Modern Claude models support extended thinking. Token accounting must handle:
THINKING BLOCKS:
- thinking: visible reasoning (counts toward output tokens)
- redacted_thinking: reasoning hidden from user (still counts toward output)
- Budget: thinkingBudget field in API request (1024-2048 minimum)
TOKEN ESTIMATION WITH THINKING:
effectiveOutput = outputTokens + thinkingTokens
maxTokensOverride >= thinkingBudget + 1 // Must leave room for at least 1 output token
// During message normalization:
if model doesn't support thinking:
strip all thinking/redacted_thinking blocks from messages
else:
preserve thinking blocks for context continuity
COST IMPACT:
Thinking tokens billed at output token rate.
A 4K-token thinking budget on every turn doubles effective output cost.
Cost tracker must account: mu.outputTokens += usage.output_tokens (includes thinking)
Pattern 4.10: Session Memory (Long-Term Fact Persistence)
Orthogonal to compaction (which summarizes messages). Session memory preserves facts across sessions:
SESSION MEMORY LIFECYCLE:
1. Extraction: background process analyzes conversation, extracts facts
2. Persistence: facts stored with lastSummarizedMessageId as cursor
3. Retrieval: on session resume, inject memory as context
4. Compaction integration: if memory available, use as compact summary
(avoids expensive summarization API call)
CURSOR MODEL:
lastSummarizedMessageId tracks what's been processed.
Messages after this ID are "unsummarized" — must be preserved during compaction.
If cursor not found in messages (edited externally), fall back to legacy compact.
Edge Cases
- Prompt-too-long during compact itself: retry by truncating oldest API-round groups (max 3 retries)
- API user abort during compact: detected via
isApiErrorMessageflag - WebSocket idle timeout during long compaction: keep-alive heartbeat every 30s
slice(-0)returns full array: flooredkeepRecenttomax(1, ...)- Sub-agent compaction must NOT reset main-thread module-level state
- SDK sometimes drops 529 status during streaming: check
"type":"overloaded_error"in message body - Fast mode overage disabled: read from response header, enter cooldown
- Pre-seeded 529 counter from streaming fallback carries across retry instances
- Thinking blocks: strip if model doesn't support them, budget must be >= 1024
maxTokensOverridemust leave room for thinking budget + 1 output token- Session memory cursor not found: falls back to legacy full compact
- Resumed session with no cursor: treats all messages as unsummarized
Anti-Patterns
- Running full compaction when micro-compact would suffice (wasted API calls)
- No circuit breaker on auto-compact (can waste 250K+ API calls/day on stuck sessions)
- Retrying 529s from background agents (amplifies cascading failures)
- Clearing all caches after sub-agent compaction (corrupts main thread state)
- Not preserving API invariants when truncating (orphaned tool_results crash the API)
- Setting retry delay without jitter (thundering herd)
Cross-Cutting Patterns
Signal-Based Refresh with Catch-Up
Used by both feature flags and plugin updates:
signal = createSignal()
function onRefresh(listener):
unsub = signal.subscribe(listener)
// Catch-up: if init already completed, fire once on next microtask
if alreadyInitialized:
queueMicrotask(() => { if stillSubscribed: listener() })
return () => { subscribed = false; unsub() }
Safe Config Mutation with Change Detection
function setConfigOverride(feature, value):
saveConfig(current => {
overrides = current.overrides ?? {}
if value === undefined:
if !(feature in overrides): return current // No-op, same ref
{[feature]: _, ...rest} = overrides
return rest.length == 0 ? omit(current, 'overrides') : {...current, overrides: rest}
if isEqual(overrides[feature], value): return current // No-op
return {...current, overrides: {...overrides, [feature]: value}}
})
signal.emit() // Subscribers do their own change detection
API-Round Grouping (Not Human-Turn)
// Messages grouped by new assistant message.id, not by human turns
// This enables operation on single-prompt agentic sessions
function groupByApiRound(messages):
groups = []; current = []; lastAssistantId = null
for msg in messages:
if msg.type == 'assistant' && msg.id != lastAssistantId && current.length > 0:
groups.push(current); current = [msg]
else: current.push(msg)
if msg.type == 'assistant': lastAssistantId = msg.id
if current.length > 0: groups.push(current)
return groups
Module 5: Self-Configuration (Point It At Itself)
This module makes the skill actionable. When invoked, Claude can use these patterns to actually configure Claude Code's behavior.
Invariant
Claude Code's behavior is fully configurable through four surfaces: settings.json (permissions, hooks, env, model), CLAUDE.md files (project instructions), plugin manifests (extensions), and the /config command (interactive). Settings merge across sources with strict priority: policy > local > project > user > plugin.
Pattern 5.1: Settings File Hierarchy
PRIORITY (highest wins):
1. Policy settings — /etc/claude-code/managed-settings.json (admin-controlled)
2. Flag settings — inline via SDK or file path
3. Local settings — .claude/settings.local.json (gitignored, personal)
4. Project settings — .claude/settings.json (committed, shared)
5. User settings — ~/.claude/settings.json (global personal)
6. Plugin settings — via plugin manifests (lowest priority)
MERGE RULES:
- Objects: deep merge (higher priority keys override)
- Arrays: concatenate and deduplicate across sources
- Primitives: highest priority source wins
Pattern 5.2: Permission Rules
Configure what Claude can do without asking:
// In settings.json or .claude/settings.json
{
"permissions": {
"allow": [
"Read", // All file reads
"Bash(git:*)", // All git commands
"Bash(npm test)", // Exact command
"Bash(cargo build:*)", // Prefix match
"Write(src/**/*.ts)", // Glob match on paths
"mcp__serverName__*" // All tools from an MCP server
],
"deny": [
"Bash(rm -rf:*)", // Block destructive commands
"Bash(curl:*)", // Block network from shell
"Write(.env*)" // Block writing secrets
],
"defaultMode": "auto" // auto | default | plan | acceptEdits | dontAsk
}
}
Rule syntax:
Tool— matches all uses of that toolTool(exact command)— exact string matchTool(prefix:*)— prefix match (word boundary enforced)Tool(glob pattern)— glob matching for file pathsmcp__server__tool— MCP tool permission- Deny always beats allow (same priority logic as Module 1)
Pattern 5.3: Hooks (Automated Behaviors)
Hooks run shell commands, prompts, agents, or HTTP calls in response to events:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "echo 'About to run bash'",
"timeout": 5000
}
]
}
],
"PostToolUse": [
{
"matcher": "Write",
"hooks": [
{
"type": "command",
"command": "npx prettier --write $CLAUDE_FILE_PATH",
"timeout": 10000,
"statusMessage": "Formatting..."
}
]
}
],
"UserPromptSubmit": [
{
"hooks": [
{
"type": "prompt",
"prompt": "Check if the user's request is clear before proceeding"
}
]
}
],
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "cat ~/daily-context.md",
"statusMessage": "Loading daily context..."
}
]
}
]
}
}
Hook events:
| Event | Fires When | Common Use |
|---|---|---|
PreToolUse |
Before any tool runs | Validation, logging |
PostToolUse |
After any tool completes | Formatting, linting |
UserPromptSubmit |
User sends a message | Input validation |
SessionStart |
New session begins | Context loading |
SessionEnd |
Session closes | Cleanup, summaries |
PreCompact |
Before compaction | State preservation |
PostCompact |
After compaction | Cache warming |
Stop |
Agent stops | Notifications |
Notification |
System notification | Alerts, logging |
TaskCreated |
New task created | Progress tracking |
TaskCompleted |
Task finished | Reporting |
Hook types:
| Type | What It Does | Best For |
|---|---|---|
command |
Runs shell command | Formatting, linting, git hooks |
prompt |
Sends prompt to Claude | Validation, context injection |
agent |
Spawns a sub-agent | Complex automated workflows |
http |
Makes HTTP request | External service integration |
Pattern 5.4: CLAUDE.md Project Instructions
The instruction hierarchy that controls Claude's behavior per-project:
DISCOVERY ORDER (all loaded, lower overrides higher):
1. /etc/claude-code/CLAUDE.md — system-wide (admin)
2. ~/.claude/CLAUDE.md — user global
3. <project-root>/CLAUDE.md — project root (committed)
4. <project-root>/.claude/CLAUDE.md — project config dir
5. <project-root>/.claude/rules/*.md — modular rules (all .md files loaded)
6. <project-root>/CLAUDE.local.md — local overrides (gitignored)
FEATURES:
- @include directives: @path, @./relative, @~/home, @/absolute
- Works in leaf text nodes only (not inside code blocks)
- Circular reference prevention built in
- Frontmatter extraction with glob patterns
Best practice for team projects:
project/
CLAUDE.md # Shared conventions, architecture overview
.claude/
settings.json # Shared permissions and hooks
rules/
testing.md # Testing conventions
api-patterns.md # API design rules
security.md # Security requirements
CLAUDE.local.md # Personal preferences (gitignored)
.claude/
settings.local.json # Personal permissions (gitignored)
Pattern 5.5: Configuration Recipes
Recipe: Lock down a production repo
{
"permissions": {
"deny": [
"Bash(rm -rf:*)", "Bash(git push --force:*)", "Bash(git reset --hard:*)",
"Bash(DROP TABLE:*)", "Bash(DELETE FROM:*)",
"Write(.env*)", "Write(*secret*)", "Write(*credential*)"
],
"defaultMode": "plan"
}
}
Recipe: Auto-format on every file write
{
"hooks": {
"PostToolUse": [{
"matcher": "Write",
"hooks": [{
"type": "command",
"command": "npx prettier --write \"$CLAUDE_FILE_PATH\" 2>/dev/null || true",
"timeout": 10000,
"statusMessage": "Formatting..."
}]
}]
}
}
Recipe: Auto-lint after bash commands
{
"hooks": {
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "if echo \"$CLAUDE_TOOL_INPUT\" | grep -q 'git commit'; then npm run lint 2>/dev/null; fi",
"timeout": 30000,
"statusMessage": "Linting..."
}]
}]
}
}
Recipe: Inject daily context on session start
{
"hooks": {
"SessionStart": [{
"hooks": [{
"type": "command",
"command": "echo '---'; echo 'Active sprint:'; cat ~/sprint-context.md 2>/dev/null; echo '---'; echo 'Recent PRs:'; gh pr list --limit 5 2>/dev/null",
"timeout": 10000,
"statusMessage": "Loading context..."
}]
}]
}
}
Recipe: Full autonomous mode for trusted projects
{
"permissions": {
"allow": [
"Read", "Write(src/**)", "Write(tests/**)",
"Bash(git:*)", "Bash(npm:*)", "Bash(cargo:*)",
"Bash(python:*)", "Bash(pytest:*)",
"Edit", "Glob", "Grep"
],
"deny": [
"Bash(rm -rf /)", "Bash(sudo:*)",
"Write(.env*)", "Write(*secret*)"
],
"defaultMode": "auto"
}
}
Recipe: Modular project rules via .claude/rules/
<!-- .claude/rules/api-patterns.md -->
When writing API endpoints:
- Always validate input with zod schemas
- Return consistent error shapes: {error: string, code: number}
- Add OpenAPI JSDoc comments to every endpoint
- Rate limit all public endpoints
<!-- .claude/rules/testing.md -->
Testing conventions:
- Every new function gets a test file in __tests__/
- Use vitest, not jest
- Mock external services, never databases
- Minimum 80% branch coverage for new code
Pattern 5.6: Plugin/Skill Installation
# Install a plugin marketplace
claude plugin install anthropics/claude-code-plugins
# Enable/disable specific plugins
# In settings.json:
{
"enabledPlugins": {
"my-plugin@marketplace-name": true,
"another-plugin@marketplace-name": false
}
}
# Create a local skill (no marketplace needed)
mkdir -p ~/.claude/skills/my-skill/
cat > ~/.claude/skills/my-skill/SKILL.md << 'EOF'
---
name: my-skill
description: What this skill does
userInvocable: true
---
# My Skill
Instructions for Claude when this skill is invoked...
EOF
Edge Cases
settings.local.jsonis always gitignored -- safe for personal API keys and preferences- Policy settings (
/etc/claude-code/managed-settings.json) cannot be overridden by user/project allowManagedPermissionRulesOnly: truelocks permission rules to admin-onlydisableAllHooks: truekills all hooks including status line- Hooks with
once: truefire only once per session async: truehooks don't block the tool execution- Invalid settings fields are preserved in files for user correction (not silently dropped)
Anti-Patterns
- Putting secrets in
settings.jsoninstead ofsettings.local.json - Using
dontAskmode without deny rules (no safety net) - Writing hooks that modify the same files Claude is editing (race conditions)
- Putting implementation details in CLAUDE.md (it's for conventions and rules, not code)
- Using
Bash(*)as an allow rule (equivalent to no permission system)
Module 6: Query Engine & Main Loop Orchestration
Invariant
Every iteration of the main loop produces exactly one API call, consumes its streamed response, executes any tool calls concurrently, and either terminates or continues with accumulated messages. The loop is an AsyncGenerator — the caller pulls messages one at a time. Recovery from errors is state-machine-based with named transitions.
Pattern 6.1: The Agentic Message Loop
function* query(messages, systemPrompt, canUseTool, tools):
state = { messages, turnCount: 1, recoveryCount: 0 }
while true:
msgs = state.messages
# 1. Context compression pipeline (ORDER MATTERS)
msgs = applyToolResultBudget(msgs) // trim large results
msgs = snipOldMessages(msgs) // remove beyond horizon
msgs = microcompactLargeResults(msgs) // summarize or cache-edit
msgs = collapseArchivedSegments(msgs) // project collapsed view
compacted = autocompactIfOverThreshold(msgs)
if compacted: msgs = compacted
# 2. Pre-flight guard
if tokenCount(msgs) > HARD_LIMIT and !autoRecoveryEnabled:
yield ErrorMessage("prompt too long"); return
# 3. Stream model response; execute tools concurrently
toolExecutor = StreamingToolExecutor(tools, canUseTool)
assistantMsgs = []
for event in callModel(msgs, systemPrompt):
if event.type == "assistant":
assistantMsgs.append(event)
for block in event.toolUseBlocks:
toolExecutor.startTool(block) // concurrent execution begins
yield event
for result in toolExecutor.completedResults():
yield result // drain during streaming
# 4. No tool calls: attempt recovery or finish
if not hasToolUse:
if isPromptTooLong: try collapse/reactive compact or return
if isMaxOutputTokens and recoveryCount < 3:
inject resumeNudge; recoveryCount++; continue
return // turn complete
# 5. Drain remaining tool results
toolResults = drain(toolExecutor.remaining())
# 6. Inject post-tool attachments
toolResults += getMemoryAttachments() + getSkillAttachments()
# 7. Guard turn limit
if ++turnCount > maxTurns: return
# 8. Continue loop
state.messages = msgs + assistantMsgs + toolResults
Pattern 6.2: Session Lifecycle Wrapper (QueryEngine)
class QueryEngine:
messages: Message[] // full conversation history
totalUsage: Usage // cumulative token counts
permissionDenials: [] // tracked for final result
function* submitMessage(prompt):
# Wrap canUseTool to track denials
wrappedCanUseTool = (tool, input) =>
result = canUseTool(tool, input)
if result != "allow": permissionDenials.append(denial)
return result
# Process user input (slash commands, attachments)
{newMessages, shouldQuery} = processUserInput(prompt)
messages.push(...newMessages)
if not shouldQuery:
yield commandOutput; yield Result(success); return
# Consume query generator
for message in query(messages, systemPrompt, wrappedCanUseTool):
switch message.type:
"assistant": messages.push(message); yield toSDK(message)
"user": messages.push(message); yield toSDK(message)
"stream_event": trackUsage(message.event) // accumulate tokens
"system" where "compact_boundary":
messages.splice(0, boundaryIndex) // GC pre-compact messages
"attachment" where "max_turns_reached":
yield Result(error_max_turns); return
if totalCost >= maxBudget:
yield Result(error_max_budget); return
yield Result(success, cost, usage, denials)
Pattern 6.3: Streaming Tool Execution (Concurrent + Ordered)
class StreamingToolExecutor:
tools: TrackedTool[] = []
siblingAbort = childAbortController(parentAbort)
addTool(block):
safe = toolDef.isConcurrencySafe(block.input)
tools.push({id, block, status: "queued", safe, results: []})
processQueue()
canExecute(safe):
executing = tools.filter(t => t.status == "executing")
return executing.empty OR (safe AND all executing are safe)
processQueue():
for tool in tools where status == "queued":
if canExecute(tool.safe):
launchAsync(tool) // chains processQueue on completion
else if NOT tool.safe:
break // preserve ordering for exclusive tools
launchAsync(tool):
tool.status = "executing"
for update in runToolUse(tool.block, context):
if update.isError AND tool.isBash:
siblingAbort.abort("sibling_error") // cascade kill
tool.results.push(update)
tool.status = "completed"
processQueue() // unblock next
*getCompletedResults(): // ordered drain
for tool in tools:
if tool.status == "completed":
tool.status = "yielded"
yield all tool.results
else if tool.status == "executing" AND NOT tool.safe:
break // hold order for exclusive tools
Key insight: Concurrent-safe tools fan out in parallel. Non-concurrent tools serialize. Bash errors cascade-cancel all siblings via a child AbortController. Results are always emitted in tool-receive order regardless of completion order.
Edge Cases
- Recovery is state-machine-based: implicit states via
Stateobject +continue(next_turn, reactive_compact_retry, max_output_tokens_recovery, stop_hook_blocking) - Permissions are callback-injected, not hardcoded — the entire permission system is a single
canUseToolfunction parameter - Compact boundary handling: QueryEngine splices pre-boundary messages for GC — the only place conversation memory is physically freed
- Agent scope isolation: queue draining is scoped by
agentIdto prevent cross-agent message leaks - Auto-background after 120s if feature-flagged
Anti-Patterns
- Hardcoding permission checks in the query loop (makes testing/extension impossible)
- Running compaction layers in wrong order (snip must come before microcompact before autocompact)
- Not draining tool results during streaming (wastes tool I/O latency overlap)
- Missing turn limit guard (infinite loops on tool-calling models)
Module 7: Message Pipeline & Cost Tracking
Invariant
Every internal Message[] must be normalized into strict (UserMessage | AssistantMessage)[] before any API call. Cost state is a monotonically increasing accumulator per session, partitioned by model. Every API error is classified by a string tag for retry logic AND converted to a user-visible message — dual-function design with shared cascade.
Pattern 7.1: Message Normalization Pipeline
function normalizeMessagesForAPI(messages, availableTools):
toolNames = Set(availableTools.map(t => t.name))
# Phase 1: Reorder attachments up to tool_result/assistant boundaries
reordered = reorderAttachments(messages)
reordered = reordered.filter(m => not m.isVirtual)
# Phase 2: Build strip-map for past media errors
# If assistant error matches known media error, walk backward to find
# the user message with the offending image/doc block and mark it
stripTargets = Map<messageUUID, Set<blockType>>
for i, msg in reordered:
if msg is synthetic API error:
blockTypes = KNOWN_ERROR_TO_BLOCK_TYPES[errorText]
if blockTypes:
for j = i-1 downto 0:
if reordered[j] is user and isMeta:
stripTargets[reordered[j].uuid].addAll(blockTypes)
break
# Phase 3: Filter, merge, normalize
result = []
for msg in reordered:
skip: progress messages, non-command system messages
user: strip unavailable tool_refs, strip flagged media blocks,
merge with previous if consecutive
assistant: strip unsupported thinking blocks, normalize tool_use inputs,
insert placeholder if consecutive
attachment: convert to user message content, merge with previous user
result.append(normalized)
# Phase 4: Ensure tool_use / tool_result pairing
ensureToolResultPairing(result)
return result
Key insight: The strip-map for past media errors is crucial — rather than re-sending a 50MB PDF on every retry, it retroactively removes the offending block from the user message that originally contained it.
Pattern 7.2: Cost Accumulation Model
state = {
totalCostUSD: 0,
modelUsage: Map<modelName, ModelUsage>, // per-model buckets
}
struct ModelUsage:
inputTokens, outputTokens: int
cacheReadInputTokens, cacheCreationInputTokens: int
webSearchRequests: int
costUSD: float
contextWindow, maxOutputTokens: int
function addToTotalSessionCost(cost, usage, model):
mu = state.modelUsage.getOrCreate(model, ModelUsage())
mu.inputTokens += usage.input_tokens
mu.outputTokens += usage.output_tokens
mu.cacheReadTokens += usage.cache_read_input_tokens ?? 0
mu.cacheWriteTokens += usage.cache_creation_input_tokens ?? 0
mu.costUSD += cost
state.totalCostUSD += cost
# OTel counters for observability
costCounter.add(cost, {model})
tokenCounter.add(usage.input_tokens, {model, type: "input"})
tokenCounter.add(usage.output_tokens, {model, type: "output"})
# Recursive: account for advisor/sub-model usage
for advisorUsage in getAdvisorUsage(usage):
advisorCost = calculateUSDCost(advisorUsage.model, advisorUsage)
addToTotalSessionCost(advisorCost, advisorUsage, advisorUsage.model)
function saveSessionCosts(sessionId):
persist {sessionId, state} to config // enables resume
function restoreSessionCosts(sessionId):
stored = loadFromConfig(sessionId)
if stored?.sessionId == sessionId: state = stored; return true
return false
Pattern 7.3: Error Classification (Dual-Function Cascade)
CLASSIFICATION (for machines — retry logic, analytics):
function classifyAPIError(error) -> string:
if error.aborted: return "aborted"
if error is timeout: return "api_timeout"
if "Repeated 529" in message: return "repeated_529"
if status == 429: return "rate_limit"
if status == 529 or "overloaded_error": return "server_overload"
if "prompt is too long": return "prompt_too_long"
if "PDF pages" in message: return "pdf_too_large"
if status == 400 and "image exceeds": return "image_too_large"
if status == 400 and "tool_use ids": return "tool_use_mismatch"
if "x-api-key" in message: return "invalid_api_key"
if status == 403 and "token revoked": return "token_revoked"
if status in {401, 403}: return "auth_error"
if status >= 500: return "server_error"
if status >= 400: return "client_error"
if error is ConnectionError: return "connection_error"
return "unknown"
CONVERSION (for humans — user-visible messages):
function errorToMessage(error, model) -> AssistantMessage:
// Same cascade but produces contextual help text:
// - 429: parses rate-limit headers for reset times
// - prompt_too_long: preserves raw error in errorDetails
// (reactive compact parses "137500 > 135000" to calculate token gap)
// - auth errors: shows re-auth guidance
// - media errors: suggests stripping images
Pattern 7.4: Post-Tool Hook Lifecycle
async *runPostToolUseHooks(context, tool, input, output):
for result in executePostToolHooks(tool.name, input, output):
match result:
{cancelled}: yield attach("hook_cancelled")
{blockingError}: yield attach("hook_blocking_error", error)
{preventContinuation}: yield attach("hook_stopped"); return // hard stop
{additionalContexts}: yield attach("hook_additional_context", contexts)
{updatedMCPToolOutput}: output = result.output; yield {output}
on error:
yield attach("hook_error_during_execution", formatError(err))
# CRITICAL: Hook 'allow' never overrides settings 'deny'
resolveHookPermission(hookResult, tool, input, context):
if hookResult.behavior == "allow":
ruleCheck = checkRuleBasedPermissions(tool, input)
if ruleCheck == "deny": return ruleCheck // settings win
if ruleCheck == "ask": return promptUser()
return hookResult // fall through to normal flow
Edge Cases
- Bedrock requires merging consecutive same-role messages (Claude API doesn't)
- Message normalization must strip thinking blocks if model doesn't support them
- Cost tracker recursively accounts for advisor/sub-model usage (e.g., Haiku classifier costs)
- Session cost is persist-able to config, enabling resume across process restarts
- Error classification returns ~20 string tags —
"unknown"is the catch-all errorDetailsfield in prompt-too-long errors feeds back into reactive compact token gap parsing- Async hooks run shell commands in background, polled for completion via
checkForAsyncHookResponses() - Hook registry uses
Promise.allSettledso one hook failure doesn't orphan others
Anti-Patterns
- Not stripping media blocks from past failed messages (re-sends 50MB PDF every retry)
- Single error handler for all API errors (need separate classify + convert paths)
- Not tracking cost per-model (can't attribute spend to classifiers vs main model)
- Letting hook
allowoverride settingsdeny(security hole) - Synchronous hook execution (blocks the query loop)
Smoke Tests
Test 1: Permission Cascade Priority
Given: deny rule "rm:*", allow rule "rm -rf node_modules"
When: command "rm -rf node_modules" is checked
Then: result is DENY (deny always wins over allow)
Given: no rules configured
When: command "cat README.md" is checked
Then: result is ALLOW (read-only auto-allow)
Given: allow rule "git:*"
When: command "FOO=bar git status" is checked
Then: result is ALLOW (safe env var stripped for allow matching)
Given: allow rule "docker:*"
When: command "DOCKER_HOST=evil docker ps" is checked
Then: result is ASK (DOCKER_HOST not in safe env vars for allow)
Test 2: Compaction Hierarchy Selection
Given: 15 minutes since last assistant message, cache warm
When: microcompact runs
Then: cache-editing path selected (not time-based)
Given: 90 minutes since last assistant message
When: microcompact runs
Then: time-based path selected, old tool results cleared to '[Old tool result content cleared]'
Given: session memory available with cursor at message #50, 100 messages total
When: auto-compact triggers
Then: session memory compact used (not full compact), messages after cursor preserved
with API invariants (orphaned tool_use pulled in)
Given: 3 consecutive auto-compact failures
When: auto-compact would trigger again
Then: circuit breaker fires, no API call made
Test 3: Agent Foreground-to-Background Transition
Given: agent running in sync foreground mode
When: background signal fires mid-execution
Then: 1. Current iterator gets 1s cleanup timeout
2. Existing messages replayed through progress tracker
3. New async context continues iteration
4. Caller receives {status: 'async_launched'} immediately
5. Worktree preserved (not auto-cleaned)