Sandbox Runner
Sandbox Runner Skill
Run solutions through recursive sandboxing, iterating up to 100 times before presenting to human.
Concept
"Let agents fail 100 times in simulation so humans see only the winning solution."
Sandbox Loop
[Solution] → Sandbox → Test → Pass? → [Present to Human]
↑ │
└── No ──┘ (iterate, max 100x)
Configuration
sandbox:
max_iterations: 100
early_exit_threshold: 0.95 # Confidence to stop early
checkpoint_every: 10 # Save state every N iterations
tests:
- unit: "npm run test"
- lint: "npm run lint"
- build: "npm run build"
success_criteria:
all_tests_pass: true
no_lint_errors: true
build_success: true
Iteration Protocol
Per Iteration:
- Apply solution
- Run test suite
- Collect failures
- Analyze failures
- Generate fix
- Loop until pass OR max iterations
State Tracking:
iteration:
number: 42
status: running | passed | failed
failures: ["test A", "test B"]
fixes_attempted: [...]
confidence: 0.72
Early Exit Conditions
Stop before 100 iterations if:
- All tests pass (success)
- Confidence > 0.95
- Same failure 5x in a row (stuck)
- Critical error detected
Output Format
sandbox_result:
iterations_run: 47
final_status: passed | failed
solution:
code_changes: [...]
confidence: 0.89
iteration_log:
- iter: 1, status: failed, fixes: [...]
- iter: 2, status: failed, fixes: [...]
- iter: 47, status: passed
ready_for_hitl: true
Integration
- Input from: Swarm orchestrator
- Output to: HITL auditor
- Rollback: Restore to pre-sandbox state on failure
More from mark393295827/house-maint-ai
django-security
Django security best practices, authentication, authorization, CSRF protection, SQL injection prevention, XSS prevention, and secure deployment configurations.
10backend-patterns
Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.
9cpp-coding-standards
C++ coding standards based on the C++ Core Guidelines (isocpp.github.io). Use when writing, reviewing, or refactoring C++ code to enforce modern, safe, and idiomatic practices.
9springboot-tdd
Test-driven development for Spring Boot using JUnit 5, Mockito, MockMvc, Testcontainers, and JaCoCo. Use when adding features, fixing bugs, or refactoring.
9cost-aware-llm-pipeline
Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
8configure-ecc
Interactive installer for Everything Claude Code — guides users through selecting and installing skills and rules to user-level or project-level directories, verifies paths, and optionally optimizes installed files.
8