error-recovery
Error Recovery Protocol
When an error occurs, stop, think, and try the right recovery strategy. No blind retries — understand the error signal first, then act.
Core principle: Every error carries a signal. Read the signal first, then act.
Error Classification
Classify every error into one of 4 categories — the recovery strategy depends on the category:
Transient Error
Retrying usually fixes it. Infrastructure or network related.
- Examples: timeout, rate limit (429), connection drop, temporary service outage
- Strategy: Wait & Retry with exponential backoff
Configuration Error
Environment or setup issue. Code is correct but setup is wrong.
- Examples: missing env variable, wrong file path, permission denied, missing dependency
- Strategy: Fix & Continue — identify the issue, fix it, re-run
Logic Error
Code or approach is wrong. Retrying produces the same error.
- Examples: KeyError, TypeError, wrong algorithm, expectation mismatch
- Strategy: Alternative Approach — try a different method
Permanent / External Error
Out of control, cannot be fixed. External service or permission boundary.
- Examples: 403 Forbidden, 404 Not Found, quota exceeded, API deprecated
- Strategy: Escalation — inform the user, ask for direction
Retry Strategy
For transient errors, use exponential backoff:
Attempt 1: Retry immediately
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds -> move on or escalate
Maximum retries: 3 attempts. If all 3 fail → re-evaluate the category.
Rate limit (429) special rule:
- If response has
Retry-Afterheader, wait that duration - Otherwise wait 60 seconds, then retry
Decision Tree
Error received
|
Classify the error
|
+------------------------------------+
| Transient? -> Wait & Retry (max 3)|
| Config? -> Fix & Continue |
| Logic? -> Alternative approach|
| Permanent? -> Escalation |
+------------------------------------+
|
Every strategy fails -> Escalation
Escalation Protocol
Escalate to the user when:
- 3 retries failed
- Permanent / external error
- 2 consecutive different strategies failed
- Error category cannot be determined
ERROR ESCALATION
================================
Failed step : [step name]
Error : [error message summary]
Category : [Transient / Config / Logic / Permanent]
Tried : [what was attempted — short list]
Result : All strategies exhausted
================================
Options:
A) [Alternative approach suggestion]
B) [Simpler / partial solution]
C) Skip this step, continue
D) Stop the task
Partial Success
For bulk operations where some items succeed and some fail:
PARTIAL SUCCESS
================================
Successful : N / Total
Failed : M items
================================
Failed items:
- [item]: [reason]
Options:
A) Retry only failed items
B) Continue with successful items, skip failed
C) Cancel all
Error Log
Log every error and recovery attempt:
[ERROR LOG]
Step : [step name / number]
Error : [message]
Category : [type]
Attempt 1: [strategy] -> [result]
Attempt 2: [strategy] -> [result]
Result : Recovered / Escalated
When to Skip
- Error is expected behavior (e.g., "file not found" when checking existence)
- User said "ignore errors, continue"
- One-off, non-repeatable task
Guardrails
- Never blind-retry a logic error — retrying won't help, change the approach.
- Always log every attempt — even successful recoveries need a record.
- Cross-skill: integrates with
checkpoint-guardian(risk assessment before retry),memory-ledger(logs errors and fixes), andagent-reviewer(retrospective analysis).
More from fatih-developer/fth-skills
task-decomposer
Break down large, complex, or ambiguous tasks into independent subtasks with dependency maps, execution order, and success criteria. Plan first, then execute step by step. Triggers on 'how should I do this', 'where do I start', 'plan the project', 'break it down', 'implement' or whenever a task involves multiple phases.
24context-compressor
Compress long conversation histories, large code files, research results, and documents by 70% without losing critical information. Triggers when context window fills up, when summarizing previous steps in multi-step tasks, before loading large files into context, or on 'summarize', 'compress', 'reduce context', 'save tokens'.
18multi-brain-debate
Two-round debate protocol where perspectives challenge each other before consensus. Round 1 presents independent positions, Round 2 allows counter-arguments and rebuttals. Produces battle-tested decisions for high-stakes choices.
17multi-brain-score
Confidence scoring overlay for multi-brain decisions. Each perspective rates its own confidence (1-10) with justification. Consensus uses scores as weights, flags low-confidence areas, and surfaces uncertainty explicitly.
15checkpoint-guardian
Automatic risk assessment before every critical action in agentic workflows. Detects irreversible operations (file deletion, database writes, deployments, payments), classifies risk level, and requires confirmation before proceeding. Triggers on destructive keywords like deploy, delete, send, publish, update database, process payment.
14parallel-planner
Analyze multi-step tasks to identify which steps can run in parallel, build dependency graphs, detect conflicts (write-write, read-write, resource contention), and produce optimized execution plans. Triggers on 3+ independent steps, 'speed up', 'run simultaneously', 'parallelize', 'optimize' or any task where sequential execution wastes time.
14