circuit-breaker by pixel-process-ug/superkit-agents

Overview

The circuit-breaker skill is a safety mechanism that prevents infinite loops, resource exhaustion, and accidental destruction during autonomous development. It operates at the loop level (complementing resilient-execution which operates at the task level). Without circuit-breaker protection, autonomous loops can waste hours on stagnant problems, exhaust API limits, or accidentally destroy configuration files. This skill enforces hard boundaries that keep autonomous operations productive and safe.

Announce at start: "Circuit breaker is active — monitoring for stagnation, rate limits, and file protection."

Phase 1: Circuit State Check

Before each loop iteration, check the current circuit state:

+-----------+     threshold     +-----------+     cooldown     +------------+
|  CLOSED   |----exceeded----->|   OPEN    |----elapsed------>| HALF-OPEN  |
|  (normal) |                  |  (halted) |                  |  (probe)   |
+-----------+                  +-----------+                  +-----+------+
     ^                                                              |
     |                          success                             |
     +--------------------------------------------------------------+
     |                          failure                             |
     |                    +-----------+                              |
     +--------------------+   OPEN    |<----------------------------+
                          +-----------+

State	Meaning	Action
CLOSED	Normal operation	Execute iteration, monitor all thresholds
OPEN	Halted due to threshold breach	Report status, wait for cooldown, or escalate
HALF-OPEN	Probing after cooldown	Allow ONE iteration. If success: close. If failure: re-open.

STOP: Check circuit state BEFORE executing any loop iteration. Do NOT execute if circuit is OPEN.

Phase 2: Stagnation Detection

Monitor these thresholds continuously during autonomous operation:

Condition	Threshold	Detection Method	Action
No progress	3 consecutive loops with zero meaningful changes	Track files modified + tasks completed per loop	OPEN circuit
Identical errors	5 consecutive loops producing the same error	Compare error messages across iterations	OPEN circuit
Output decline	70% decline in output volume across iterations	Compare output line count across last 3 iterations	OPEN circuit
Permission denials	3 consecutive tool permission failures	Track permission errors	OPEN circuit
Test fix loop	>80% of effort spent on test fixes only	Track work type per iteration	OPEN circuit, investigate root cause
Circular approach	Same 2-3 approaches alternating without resolution	Track approach history	OPEN circuit

Stagnation Scoring

Each iteration, compute a progress score:

Indicator	Score
New test passing that was previously failing	+3
Task marked complete	+5
File modified with meaningful changes	+1
Build/lint error resolved	+2
Same error as previous iteration	-2
No files modified	-3
Reverted previous changes	-1

Threshold: If cumulative score across 3 iterations is negative, OPEN the circuit.

STOP: If any threshold is breached, OPEN the circuit immediately. Do NOT attempt "one more try."

Phase 3: Recovery Protocol

When the circuit opens, follow this recovery sequence:

Cooldown Period

Default: 30 minutes before retry
Purpose: Prevents rapid cycling through the same failing state
After cooldown: Circuit enters HALF-OPEN state

HALF-OPEN Behavior

Allow exactly ONE iteration to execute
If successful (positive progress score): Close circuit, resume normal operation
If failed (same stagnation pattern): Re-open circuit, double the cooldown timer

Recovery Strategy Decision Table

Stagnation Type	Strategy 1	Strategy 2	Strategy 3	Strategy 4
No progress (stuck on same task)	Regenerate plan with fresh analysis	Break stuck task into 3+ subtasks	Skip to next task, return later	Escalate to user
Identical errors (same error repeating)	Change approach entirely	Check if error is environmental	Search for known issue/workaround	Escalate with error log
Test fix loop (tests keep breaking)	Review test assumptions	Check if implementation approach is flawed	Simplify implementation scope	Escalate with test analysis
Circular approach (alternating same fixes)	Step back and re-analyze root cause	Try approach NOT yet attempted	Reduce scope to minimal working version	Escalate with approach history

STOP: After recovery, monitor the next 3 iterations closely. If stagnation recurs, escalate immediately.

Phase 4: Rate Limiting

Track and enforce API usage limits:

Parameter	Default	Purpose
MAX_CALLS_PER_HOUR	100	Prevents API overuse
Reset window	Hourly (rolling)	Automatic counter reset
Countdown display	Active	Shows remaining calls before limit

Rate Limit Behavior

Track API calls per rolling hour
At 80% of limit: display warning, prioritize remaining calls
At 100% of limit: pause execution, display countdown to reset
Never exceed limit — wait for reset window

Three-Layer Timeout Detection

For long-running operations (especially API calls with extended limits):

Layer	Detection	Fallback
1. Timeout guard	Exit code 124 or timeout signal	Capture partial output, log what completed
2. JSON validation	Parse response structure	Attempt text extraction from raw response
3. Text fallback	Raw output capture	Log everything, report for human review

Phase 5: File Protection

Protected Paths

Path	Type	Why Protected
`.ralph/`	Directory	Loop state and configuration
`.ralphrc`	File	Ralph configuration
`IMPLEMENTATION_PLAN.md`	File	Current plan — source of truth for loop
`AGENTS.md`	File	Agent definitions
`specs/`	Directory	Specifications — source of truth for features
`.claude/`	Directory	Claude Code configuration
`CLAUDE.md`	File	Agent operating manual
`memory/`	Directory	Persisted learnings across sessions

Protection Mechanisms

Mechanism	How It Works	When It Triggers
Allowlist enforcement	Only permitted tools can modify protected files	Before any file write to protected path
Integrity validation	Check protected files exist after each iteration	End of every loop iteration
Pre-operation checks	Verify protected files before destructive operations	Before `rm`, `git clean`, `git checkout .`
Restricted commands	Block `git clean`, `git rm` on protected paths, `rm -rf` on config dirs	When command targets protected path

Pre-Destructive Operation Checklist

Before any rm, git clean, or git checkout .:

List all files that will be affected
Check each against the protected paths list
If ANY protected file would be affected: ABORT and report
If safe: proceed with caution
After operation: verify all protected files still exist

STOP: If a protected file is missing after any operation, halt immediately and restore it.

Phase 6: Monitoring and Metrics

Track these metrics across loop iterations:

Metric	Purpose	Alert Threshold
Loop count	Total iterations executed	>20 for a single task
Tasks completed	Progress measurement	0 for 3+ iterations
Files modified	Change velocity	0 for 3+ iterations
Test pass rate	Quality trend	Declining for 3+ iterations
Error frequency	Stagnation early warning	Increasing for 3+ iterations
Output volume	Productivity trend	70% decline
API calls remaining	Rate limit proximity	<20% remaining
Progress score	Overall health	Negative for 3 iterations

Per-Iteration Status Log

## Iteration [N] — [timestamp]
- Circuit state: CLOSED / HALF-OPEN
- Tasks completed: [N]
- Files modified: [list]
- Tests: [X passed, Y failed, Z skipped]
- Errors encountered: [list]
- Progress score: [+/- N]
- API calls remaining: [N]
- Stagnation risk: LOW / MEDIUM / HIGH

Anti-Patterns / Common Mistakes

What NOT to Do	Why It Fails	What to Do Instead
Ignore stagnation signals	Wastes hours on unsolvable problems	Open circuit at threshold breach
Manually override open circuit	Bypasses safety mechanism	Follow recovery protocol properly
Skip file protection checks	Config deletion derails entire project	Always verify protected files after operations
Set cooldown to zero	Rapid cycling through same failure	Respect 30-minute minimum cooldown
Count test-fix-only iterations as progress	Masks the real problem (flawed approach)	Flag >80% test-fix effort as stagnation
Delete and recreate protected files	Loses configuration state	Never delete protected files, only update
Ignore rate limit warnings	Hits hard limit mid-operation	Prioritize when at 80% of limit
Run destructive commands without pre-checks	May delete protected files	Always check affected files first

Anti-Rationalization Guards

Thought	Reality
"One more try will fix it"	That is what you said 3 iterations ago. Open the circuit.
"The error is almost fixed"	"Almost" for 5 iterations means the approach is wrong.
"I cannot stop now, I am so close"	Sunk cost fallacy. Open circuit, reassess.
"The cooldown is too long"	The cooldown prevents wasting more time on the same failure.
"These config files are not important"	They are protected for a reason. Do not delete them.
"The rate limit will not be hit"	Track it. Do not guess.
"This is a different error"	Check if it is truly different or the same root cause manifesting differently.

Do NOT override an open circuit. Follow the recovery protocol.

Integration Points

Skill	Relationship
`resilient-execution`	Task-level retries (3 attempts). Circuit-breaker activates AFTER resilient-execution exhausts retries within individual tasks.
`autonomous-loop`	Circuit-breaker monitors the loop. Opens circuit when loop-level stagnation detected.
`ralph-status`	Status block provides metrics for stagnation detection.
`verification-before-completion`	Circuit-breaker ensures verification passes before closing a loop.
`self-learning`	Stagnation patterns are persisted to memory for future avoidance.
`auto-improvement`	Circuit-breaker events feed into improvement metrics.

Scope Clarification

Scope	Skill	Behavior
Task-level	`resilient-execution`	Try 3 approaches for a single failing task
Loop-level	`circuit-breaker`	Halt the entire loop when patterns indicate systemic failure

The circuit breaker activates AFTER resilient-execution has exhausted its retries within individual tasks. If tasks keep failing despite 3 retries each, the circuit breaker detects the pattern.

Process Summary

Before each loop iteration: Check circuit state (CLOSED/HALF-OPEN/OPEN)
If OPEN: Report status, wait for cooldown, or escalate
If HALF-OPEN: Allow one probe iteration, evaluate result
If CLOSED: Execute normally, monitor all thresholds
After each iteration: Update metrics, compute progress score, evaluate thresholds
If threshold exceeded: Open circuit, report reason, begin cooldown
After cooldown: Enter HALF-OPEN, allow one probe
After probe: Close if successful, re-open with doubled cooldown if failed

Skill Type

RIGID — Thresholds and protection rules must be followed exactly. Do not relax circuit breaker conditions. Do not override open circuits. Do not skip file protection checks. Do not ignore stagnation signals.