codex-safe-experiment
Codex Safe Experiment Skill
LIBRARY-FIRST PROTOCOL (MANDATORY)
Before writing ANY code, you MUST check:
Step 1: Library Catalog
- Location:
.claude/library/catalog.json - If match >70%: REUSE or ADAPT
Step 2: Patterns Guide
- Location:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md - If pattern exists: FOLLOW documented approach
Step 3: Existing Projects
- Location:
D:\Projects\* - If found: EXTRACT and adapt
Decision Matrix
| Match | Action |
|---|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
Kanitsal Cerceve (Evidential Frame Activation)
Kaynak dogrulama modu etkin.
Purpose
Use Codex CLI's sandbox mode to experiment with risky changes in complete isolation. Network is disabled, only CWD is accessible, providing safe experimentation.
When to Use This Skill
- Risky refactoring that might break things
- Experimental approaches before committing
- Testing destructive operations safely
- Trying new libraries or patterns
- Major architectural changes
- Security-sensitive experiments
When NOT to Use This Skill
- Simple, low-risk changes
- When network access is needed
- When accessing files outside project
- Production debugging
- Quick fixes (use codex-iterative-fix)
Workflow
Phase 1: Experiment Design
- Define what you want to try
- Identify risk factors
- Plan verification steps
- Set success criteria
Phase 2: Sandbox Execution
# Full sandbox mode (network disabled, CWD only)
./scripts/multi-model/codex-yolo.sh "Refactor auth system" task-id "." 10 sandbox
# Via delegate.sh
./scripts/multi-model/delegate.sh codex "Try experimental approach" --sandbox
# Direct Codex
bash -lc "codex --full-auto --sandbox true --network disabled exec 'Experiment with X'"
Phase 3: Evaluation
- Review what Codex tried
- Evaluate if experiment succeeded
- Decide: Apply to real codebase?
- If yes: Apply changes outside sandbox
Sandbox Isolation Layers
| Layer | Protection |
|---|---|
| Network | DISABLED - no external connections |
| Filesystem | CWD only - no parent access |
| OS-Level | Seatbelt (macOS) / Docker |
| Commands | Blocked: rm -rf, sudo, etc. |
Success Criteria
- Experiment ran safely in sandbox
- Results evaluated
- Decision made: apply or discard
- No unintended side effects
Example Usage
Example 1: Major Refactoring
User: "Refactor entire auth system to use new pattern"
Sandbox Process:
1. Clone relevant files to sandbox context
2. Codex implements new pattern
3. Run tests in sandbox
4. Evaluate results
5. If good: Apply to real codebase
Output:
- Experiment: Success
- Tests: 45/47 passing (2 need adjustment)
- Recommendation: Apply with minor fixes
Example 2: Library Migration
User: "Try migrating from moment.js to dayjs"
Sandbox Process:
1. Install dayjs in sandbox
2. Replace moment calls
3. Run tests
4. Compare bundle size
Output:
- Migration: Feasible
- Breaking changes: 3 date format strings
- Bundle reduction: 65KB
- Recommendation: Proceed with migration
Integration with Meta-Loop
META-LOOP IMPLEMENT PHASE:
|
+---> High-risk change detected
| |
| +---> codex-safe-experiment
| | |
| | +---> Sandbox: Try change
| | +---> Evaluate: Success?
| | +---> If yes: Apply for real
| |
| +---> Continue to TEST phase
Memory Integration
Results stored at:
- Key:
multi-model/codex/experiment/{project}/{task_id} - Tags: WHO=codex-safe-experiment, WHY=sandboxed-trial
- Contains: Experiment results, recommendation, diffs
Invocation Pattern
# Via router with experiment keywords
./scripts/multi-model/multi-model-router.sh "Try refactoring X approach"
# Direct sandbox mode
bash -lc "codex --sandbox workspace-write exec 'Experiment with X'"
Guardrails
NEVER:
- Apply sandbox results without review
- Skip the evaluation phase
- Use sandbox for production debugging
- Trust sandbox results blindly
ALWAYS:
- Review sandbox diffs before applying
- Document what was tried
- Store results for future reference
- Have rollback plan ready
Decision Framework
| Experiment Result | Action |
|---|---|
| All tests pass | Apply changes |
| Minor failures | Fix then apply |
| Major failures | Discard, try different approach |
| Unexpected behavior | Investigate before deciding |
Related Skills
codex-iterative-fix: After experiment, for cleanupcodex-audit: Audit experimental changestesting-quality: Generate tests for experimentsllm-council: Decide on experimental approaches
Verification Checklist
- Experiment ran in sandbox
- Results captured and evaluated
- Decision documented
- If applied: Changes verified
- Memory-MCP updated
[commit|confident] CODEX_SAFE_EXPERIMENT_COMPLETE
More from dnyoussef/context-cascade
reverse-engineering-deep-analysis
Advanced binary analysis with runtime execution and symbolic path exploration (RE Levels 3-4). Use when need runtime behavior, memory dumps, secret extraction, or input synthesis to reach specific program states. Completes in 3-7 hours with GDB+Angr.
52reverse-engineering-firmware-analysis
Firmware extraction and IoT security analysis (RE Level 5) for routers and embedded systems. Use when analyzing IoT firmware, extracting embedded filesystems (SquashFS/JFFS2/CramFS), finding hardcoded credentials, performing CVE scans, or auditing embedded system security. Handles encrypted firmware with known decryption schemes. Completes in 2-8 hours with binwalk+firmadyne+QEMU emulation.
23reasoningbank-adaptive-learning-with-agentdb
---
14reverse-engineering-quick-triage
Fast binary analysis with string reconnaissance and static disassembly\ \ (RE Levels 1-2). Use when triaging suspicious binaries, extracting IOCs quickly,\ \ or performing initial malware analysis. Completes in \u22642 hours with automated\ \ decision gates.
13web-scraping
Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.
10build-feature
Build feature command
7