Recovery Skill

When to Use

Context window exhausted mid-workflow
Session interrupted or lost
Need to resume from last completed step
Workflow state needs reconstruction

Step 1: Identify Last Completed Step

Check gate files for last successful validation:
- Location: .claude/context/history/gates/{workflow_id}/
- Find highest step number with validation_status: "pass"
- This is the last successfully completed step
Review reasoning files for progress:
- Location: .claude/context/history/reasoning/{workflow_id}/
- Read reasoning files up to last completed step
- Extract context and decisions made
Identify artifacts created:
- Check artifact registry: .claude/context/artifacts/registry-{workflow_id}.json
- List all artifacts created up to last step
- Verify artifact files exist

Step 2: Load Plan Documents

Read plan document (stateless):
- Load plan-{workflow_id}.json from artifact registry
- Extract current workflow state
- Identify completed vs pending tasks
Load relevant phase plan (if multi-phase):
- Check if project is multi-phase (exceeds phase_size_max_lines threshold)
- Load active phase plan: plan-{workflow_id}-phase-{n}.json
- Understand phase boundaries and dependencies
Understand current state:
- Map completed tasks to plan
- Identify next steps
- Check for dependencies

Step 3: Context Recovery

Load artifacts from last completed step:
- Read artifact registry
- Load all artifacts with validation_status: "pass"
- Verify artifact integrity
Read reasoning files for context:
- Load reasoning files from completed steps
- Extract key decisions and context
- Understand workflow progression
Reconstruct workflow state:
- Combine plan, artifacts, and reasoning
- Create recovery state document
- Validate state consistency

Step 4: Resume Execution

Continue from next step:
- Identify next step after last completed
- Load step requirements from plan
- Prepare inputs for next step
Planner updates plan status (stateless):
- Update plan-{workflow_id}.json with current status
- Mark completed steps
- Update progress tracking
Orchestrator coordinates next agents:
- Pass recovered artifacts to next step
- Resume workflow execution
- Monitor for additional interruptions

</execution_process>

Failure Classification

When a task fails, classify the failure type:

Failure Type	Indicators	Recovery Action
BROKEN_BUILD	Build errors, syntax errors, module not found	ROLLBACK + fix
VERIFICATION_FAILED	Test failures, validation errors, assertion errors	RETRY with fix (max 3 attempts)
CIRCULAR_FIX	Same error 3+ times, similar approaches repeated	SKIP or ESCALATE
CONTEXT_EXHAUSTED	Token limit reached, maximum length exceeded	Compress context, continue
UNKNOWN	No pattern match	RETRY once, then ESCALATE

Circular Fix Detection

Iron Law: If the same approach has been tried 3+ times without success, STOP.

When circular fix is detected:

Stop the current approach immediately
Document what was tried (approaches, errors, files)
Try fundamentally different approach (different library, different pattern, simpler implementation)
If still failing, ESCALATE to human intervention

Detection Algorithm:

Extract keywords from current approach (excluding stop words)
Compare with keywords from last 3 attempts
If Jaccard similarity > 30% for 2+ attempts, flag as circular

Example:

Attempt 1: "Using async await for fetch"
Attempt 2: "Using async/await with try-catch"
Attempt 3: "Trying async await pattern again"
=> CIRCULAR FIX DETECTED - Stop and try callback pattern instead

Attempt Count Thresholds

Failure Type	Max Attempts	Then Action
VERIFICATION_FAILED	3	SKIP + ESCALATE
UNKNOWN	2	ESCALATE
BROKEN_BUILD	1	ROLLBACK (if good commit exists)
CIRCULAR_FIX	0	Immediately SKIP

References

See references/ for detailed patterns:

failure-types.md - Failure classification details and indicators
recovery-actions.md - Recovery action decision tree and execution
merge-strategies.md - File merge strategies for multi-agent scenarios

<best_practices>

Recovery Validation Checklist

Last completed step identified correctly
Plan document loaded and validated
All artifacts from completed steps available
Reasoning files reviewed for context
Workflow state reconstructed accurately
No duplicate work will be performed
Next step inputs prepared
Recovery logged in reasoning file

</best_practices>

<error_handling>

Error Handling

Missing plan document: Request planner to recreate plan from requirements
Missing artifacts: Request artifact recreation from source agent
Corrupted artifacts: Request artifact recreation with validation
Incomplete reasoning: Use artifact registry and gate files to reconstruct state

</error_handling>

# 1. Check gate files for last completed step
ls .claude/context/history/gates/{workflow_id}/

# 2. Load plan document
cat .claude/context/artifacts/plan-{workflow_id}.json

# 3. Review reasoning files
cat .claude/context/history/reasoning/{workflow_id}/*.json

# 4. Resume from next step

</usage_example>

<usage_example> Natural language invocation:

"Resume the workflow from where we left off"
"Recover the workflow state and continue"
"What was the last completed step?"

</usage_example>

Planner Agent: .claude/agents/core/planner.md
Memory files: .claude/context/memory/

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

recovery

Recovery Skill

When to Use

Step 1: Identify Last Completed Step

Step 2: Load Plan Documents

Step 3: Context Recovery

Step 4: Resume Execution

Failure Classification

Circular Fix Detection

Attempt Count Thresholds

References

Recovery Validation Checklist

Error Handling

Related

Memory Protocol (MANDATORY)

More from oimiragieo/agent-studio

gcloud-cli

pyqt6-ui-development-rules

filesystem

chrome-browser

slack-notifications

context-compressor