resilient-execution
Overview
The resilient-execution skill prevents premature failure by enforcing a minimum of 3 genuinely different approaches before escalating to the user. It provides a structured error classification system, an approach cascade methodology, and transparent logging of each attempt. Without this skill, agents give up too early — with it, they systematically exhaust alternatives and only escalate with full evidence.
Announce at start: "I'm using the resilient-execution skill — I will try multiple approaches before escalating."
Phase 1: Error Classification
When an approach fails, immediately classify the error before retrying:
| Error Type | Definition | Indicators | Correct Response |
|---|---|---|---|
| Transient | Temporary infrastructure failure | Network timeout, rate limit, 503 error, lock contention | Wait briefly, retry the same approach |
| Environmental | Missing or misconfigured dependency | Module not found, wrong version, missing env var, permission denied | Fix the environment, then retry same approach |
| Logical | Wrong approach or incorrect assumption | Wrong output, unexpected behavior, type mismatch, wrong API usage | Rethink the approach entirely |
| Fundamental | Genuinely impossible with available tools | API does not exist, hardware limitation, missing capability | Escalate to user with evidence |
STOP: Classify the error before choosing your next approach. Wrong classification leads to wasted retries.
Phase 2: Approach Cascade
Execute the cascade systematically. Each attempt must be a genuinely different strategy.
Attempt 1: Primary approach (most direct solution)
| fails
v
Classify error -> Can same approach work with a fix?
| YES -> Fix and retry (does NOT count as a new attempt)
| NO -> Proceed to Attempt 2
v
Attempt 2: Alternative approach 1 (different technique)
| fails
v
Classify error -> Is this fundamentally blocked?
| YES -> Proceed directly to escalation
| NO -> Proceed to Attempt 3
v
Attempt 3: Alternative approach 2 (different path entirely)
| fails
v
Circuit breaker -> Present findings to user with full evidence
For Each Attempt, Log:
### Attempt N: [Approach Name]
**Strategy:** [what makes this different from previous attempts]
**What I tried:** [specific description with commands/code]
**What happened:** [exact error or unexpected result]
**Why it failed:** [root cause analysis]
**Classification:** [Transient / Environmental / Logical / Fundamental]
**What to try next:** [reasoning for next approach]
STOP: Log every attempt before moving to the next. Do NOT skip logging — it is evidence for the escalation report.
Phase 3: Alternative Approach Selection
When the primary approach fails, select the next approach using this decision table:
| Failure Type | Strategy 1 | Strategy 2 | Strategy 3 |
|---|---|---|---|
| Library/API does not work | Different library | Direct implementation (no library) | Shell command / external tool |
| Algorithm produces wrong result | Different algorithm | Decompose into smaller steps | Simplify constraints, solve easier version |
| Permission/access denied | Different access method | Escalate with manual steps | Work around via alternative path |
| Tool limitation | Different tool | Combine multiple tools | Provide manual instructions |
| Integration failure | Mock the dependency | Use alternative interface | Isolate and test components separately |
| Performance issue | Different data structure | Batch/stream processing | Approximate solution |
Alternative Strategy Hierarchy
Try these in order of preference:
- Different tool — use a different library, API, or command
- Different algorithm — solve the same problem a different way
- Decompose — break the problem into smaller, solvable parts
- Simplify — remove constraints and solve a simpler version first
- Work around — achieve the goal through a different path entirely
- Manual steps — provide clear instructions the user can follow themselves
Phase 4: Escalation Report
After 3 genuine attempts with different approaches, produce this report:
## Execution Report
I tried 3 different approaches to [goal]:
### Attempt 1: [Approach Name]
**Strategy:** [description]
**Result:** Failed because [specific reason]
**Error:** [exact error message or unexpected output]
### Attempt 2: [Approach Name]
**Strategy:** [description]
**Result:** Failed because [specific reason]
**Error:** [exact error message or unexpected output]
### Attempt 3: [Approach Name]
**Strategy:** [description]
**Result:** Failed because [specific reason]
**Error:** [exact error message or unexpected output]
### Root Cause Analysis
[Why all three approaches failed — identify the common blocker]
### Recommended Next Steps
- **Option A:** [what the user could try]
- **Option B:** [alternative path]
- **Option C:** [if applicable]
### What I Need From You to Proceed
[Specific ask — access, information, permission, or decision]
STOP: Do NOT escalate without this report. The user needs evidence that 3 genuine attempts were made.
Decision Table: When Retries Count as "Genuine"
| Counts as Genuine Attempt | Does NOT Count |
|---|---|
| Different library or tool | Same library with different import |
| Different algorithm or data structure | Same algorithm with tweaked parameters |
| Different architectural approach | Same approach with minor code changes |
| Manual workaround vs automated | Same automation with retry loop |
| Breaking problem into sub-problems | Same monolithic approach with logging added |
| Using an entirely different API | Same API with different authentication method (unless auth was the error) |
Anti-Patterns / Common Mistakes
| What NOT to Do | Why It Fails | What to Do Instead |
|---|---|---|
| Retry the same approach 3 times and call it "3 attempts" | Same approach = same failure. Not genuine alternatives. | Each attempt must use a meaningfully different strategy |
| Give up after 1 failure | Misses 2+ viable approaches | Always try at least 3 genuinely different approaches |
| Skip error classification | Without classification, you retry wrong things | Classify BEFORE choosing next approach |
| Hide failed attempts from the user | User cannot help without context | Log and report every attempt transparently |
| Escalate without trying manual workaround | Many things that fail in automation work manually | Always consider manual steps as Approach 3 |
| Blame the platform without investigation | "Platform limitation" is often wrong | Search for workarounds before declaring impossible |
| Fix environment issues and count as new attempt | Fixing env + retrying same approach is 1 attempt | Only count genuinely different strategies |
| Skip logging intermediate attempts | Loses evidence trail, cannot produce escalation report | Log every attempt immediately |
Anti-Rationalization Guards
| Thought | Reality |
|---|---|
| "This genuinely cannot be done" | Have you tried 3 different approaches? Probably not. |
| "The error is clear, I know what is wrong" | Clear errors can have hidden root causes. Investigate. |
| "I have already tried everything" | List what you tried. There are always more options. |
| "The user should fix this themselves" | Provide a manual path, but try 3 approaches first. |
| "This is a platform limitation" | Limitations often have workarounds. Search for them. |
| "The same error keeps happening" | Same error with different approaches = different root cause. Classify. |
| "This is taking too long" | Giving up takes longer when the user has to start over. |
| "A simpler version would not be useful" | A working simple version beats a broken complex one. |
Do NOT escalate without 3 genuine attempts. Period.
Integration Points
| Skill | Relationship |
|---|---|
circuit-breaker |
Activated after resilient-execution exhausts retries at the loop level |
task-management |
Invokes resilient-execution when a task step fails |
self-learning |
Records failure patterns to avoid repeating them in future sessions |
planning |
Uses failure history to choose more robust approaches |
auto-improvement |
Tracks retry success rates and approach effectiveness |
verification-before-completion |
Invokes resilient-execution if verification fails |
Concrete Examples
Example: File Parsing Failure
Attempt 1: JSON.parse() on the file
Result: SyntaxError — file contains comments (JSONC format)
Classification: Logical — wrong parser for this format
Attempt 2: Strip comments with regex, then JSON.parse()
Result: Failed — nested block comments not handled
Classification: Logical — regex too simple for comment stripping
Attempt 3: Use `jsonc-parser` library (handles JSONC natively)
Result: Success — file parsed correctly
Example: API Integration Failure
Attempt 1: Direct HTTP request to API endpoint
Result: 403 Forbidden — authentication required
Classification: Environmental — missing auth config
Fix: Add API key from .env
Result: 429 Too Many Requests — rate limited
Classification: Transient — wait and retry
Result: 200 OK but response format changed from docs
Classification: Logical — API version mismatch
Attempt 2: Use official SDK instead of raw HTTP
Result: SDK throws "unsupported region" error
Classification: Environmental — region config needed
Attempt 3: Use GraphQL endpoint instead of REST
Result: Success — GraphQL endpoint supports all regions
Key Principles
- Never give up silently — always show what was tried
- Genuine alternatives — each attempt must be a meaningfully different approach, not the same thing with minor tweaks
- Root cause analysis — understand WHY before trying the next approach
- Learn from failure — update memory with what did not work and why
- Transparent — show the user your reasoning at each step
- Classify first — error type determines whether to retry same approach or try a new one
Skill Type
RIGID — The 3-attempt minimum is a HARD-GATE. Error classification is mandatory before each retry. The escalation report format must be followed exactly. Do not relax these requirements regardless of perceived simplicity.