Ralph Orchestrator

Wraps ralph-orchestrator for E2E autonomous development loops from Claude Code.

Quick Start

Prerequisites: Ensure ralph.yml in your project (or default at ~/Projects/ralph.yml) contains:

cli:
  backend: "claude"
  args: ["--dangerously-skip-permissions"]

Usage:

# Validate environment
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --check

# Standard workflow: plan + run
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --plan "Add user authentication" --run

# Multi-model workflow: Codex architecture + Ralph implementation + Codex review
# (See "Multi-Model Integration" section below)

Workflow

Phase 0: Prompt Refinement & Optimization

Purpose: Transform user's feature request into optimal Codex GPT 5.3 High planning prompt while preserving all nuance and intent.

Guiding Principles (January 2026 Research):

Systems-first architecture (not prompt-first)
Remove planning preambles
Use appropriate reasoning effort (medium/high/xhigh)
Structured output formatting
Chain-of-thought decomposition

Step 1: Information Gathering

Ask clarifying questions ONLY if:

Feature scope is ambiguous
Technical constraints are unspecified
Domain context is missing
Performance/security requirements unclear
Integration points undefined

Example clarifying questions:

AskUserQuestion:
  questions:
    - question: "What is the primary domain/tech stack for this feature?"
      header: "Domain"
      options:
        - label: "Web API (REST/GraphQL)"
          description: "Backend API service"
        - label: "Frontend (React/Vue/etc)"
          description: "User interface components"
        - label: "Data Pipeline"
          description: "ETL, streaming, batch processing"
        - label: "Infrastructure/DevOps"
          description: "CI/CD, deployment, monitoring"
      multiSelect: false

    - question: "What are the critical constraints?"
      header: "Constraints"
      options:
        - label: "Performance (low latency, high throughput)"
          description: "Speed and scale are paramount"
        - label: "Security/Compliance (GDPR, SOC2, etc)"
          description: "Regulatory requirements must be met"
        - label: "Cost optimization"
          description: "Minimize infrastructure/operational costs"
        - label: "Developer experience"
          description: "Easy to maintain and extend"
      multiSelect: true

CRITICAL: Only ask questions for genuinely unclear aspects. If user provides sufficient detail, proceed directly to refactoring.

Step 2: Prompt Refactoring

Transform user request using this structure:

refactored_prompt = f"""Role: Lead Software Architect specializing in {domain}

Context:
- Project: {project_name}
- Tech Stack: {tech_stack}
- Constraints: {constraints_from_user_or_questions}
- Existing Architecture: {codebase_summary}

User Intent (verbatim):
{original_user_request}

Task: Design the system architecture for: {feature_summary}

Output Requirements (Ralph TDD-Style Planning):

## Architecture Overview
1. System Design Summary
   - High-level architecture (Mermaid component diagram)
   - Data flow patterns
   - Integration points with existing systems
   - Technology choices with rationale

2. Component Breakdown
   - Core components with clear responsibilities
   - Component interactions and dependencies
   - API contracts (JSON schema or TypeScript types)
   - Data models with relationships and constraints

3. Risk Analysis & Edge Cases
   - Failure scenarios (network, data, auth, rate limits)
   - Security considerations (auth, encryption, attack surface)
   - Performance implications
   - Migration strategy if modifying existing schemas

## Implementation Plan (TDD Format)

For each implementation step, provide:

### Step N: [Clear Objective]

**Objective:** What we're building and why (one sentence)

**Implementation:**
- Specific files to create/modify
- Concrete code changes (functions, classes, interfaces)
- Configuration or schema updates
- Dependencies or libraries needed

**Test Requirements (DEFINE BEFORE IMPLEMENTING):**
- Unit tests: What specific behavior to test
- Integration tests: What interactions to verify
- Test fixtures or mocks needed
- Acceptance criteria (how to know it works)

**Integration:**
- How this step connects to other components
- What interfaces or contracts it depends on
- What it exposes for future steps

**Demo/Verification:**
- Command to run after implementing
- Expected output or behavior
- How to manually verify correctness

---

4. Implementation Checklist
   - [ ] Step 1: [Description]
   - [ ] Step 2: [Description]
   - [ ] Step 3: [Description]
   - ...

5. Step Dependencies
   - Mermaid flowchart showing which steps must complete before others
   - Identify parallel vs sequential work

6. Estimated Scope
   - Table with: Step | Files Modified | Complexity (Low/Medium/High)

## Ralph Guardrails (CRITICAL)

Your architecture must align with these principles:
- **Fresh context each iteration** - Each build step can be executed independently
- **Verification is mandatory** - Tests/typecheck/lint must pass before moving forward
- **Search before assuming** - Don't assume functionality is missing, verify first
- **Backpressure over prescription** - Design gates that reject bad work (tests fail = stop)
- **Disk is state** - Use files, git, and explicit artifacts for handoff between steps

Format: Structured markdown with Mermaid diagrams and JSON schemas where applicable.

Constraints:
- Follow existing codebase patterns: {existing_patterns}
- Maintain compatibility with: {dependencies}
- Optimize for: {performance_goals}
- {user_specific_constraints}

DO NOT include preambles, status updates, or meta-commentary. Proceed directly to architectural design.
"""

Step 3: Reasoning Effort Selection

ALWAYS use high reasoning effort:

# Always use high for quality results
reasoning_effort = "high"

High reasoning effort ensures thorough analysis for both planning and review phases. The marginal cost increase is justified by significantly better architectural decisions and more comprehensive code review findings.

Step 4: Nuance Preservation Rules

MUST preserve:

All specific technical requirements mentioned by user
Performance targets, SLAs, metrics
Security/compliance requirements
User preferences for libraries, frameworks, patterns
Timeline constraints
Budget constraints

Example preservation:

User: "Add authentication with JWT, must support refresh tokens, Redis for session store"

Refactored (preserves all specifics):
- Authentication: JWT-based (access + refresh tokens)
- Session Store: Redis (as specified)
- Adds architectural context: token rotation, expiry policies, security best practices

NOT:
- Authentication: Consider OAuth2, SAML, or JWT  ❌ (changes user's choice)
- Session Store: Evaluate Redis, Memcached, or in-memory  ❌ (introduces alternatives)

ONLY change if user explicitly revises during follow-up:

User: "Add rate limiting"
You ask: "Which rate limiting strategy - token bucket, sliding window, or fixed window?"
User: "Actually, let's use sliding window with Redis"
Refactored: Rate limiting using sliding window algorithm with Redis backing store ✓

Step 5: Anti-Patterns to Avoid

❌ DO NOT:

Ask questions already answered in user's request
Add verbose preambles ("I will now analyze...")
Over-constrain output format with micro-management
Change user's technical choices without explicit approval
Ask questions for the sake of thoroughness when context is clear

✓ DO:

Ask targeted questions only for genuine ambiguity
Preserve all user-specified technologies, patterns, constraints
Structure the prompt for optimal Codex reasoning
Use appropriate reasoning effort based on complexity
Let Codex reason without forcing meta-commentary

Example Refactoring

User Request:

"Add user authentication to the app"

Clarifying Questions:

1. What authentication method? (OAuth2, JWT, Session-based, Magic links)
2. What are the security requirements? (MFA, password policies, session timeout)
3. Integration points? (Existing user DB, third-party providers)

User Answers:

"JWT with refresh tokens, must support Google OAuth, want MFA optional, Redis for sessions"

Refactored Prompt:

Role: Lead Software Architect specializing in Web Application Security

Context:
- Project: [app_name]
- Tech Stack: [detected from codebase]
- Constraints: Must integrate with existing user database, production Redis available
- Existing Architecture: [codebase summary]

User Intent (verbatim):
"Add user authentication to the app using JWT with refresh tokens, must support Google OAuth, want MFA optional, Redis for sessions"

Task: Design the authentication system architecture

Output Requirements:
1. System Design Overview
   - Authentication flow diagram (Mermaid)
   - Token lifecycle (access + refresh)
   - OAuth2 integration with Google
   - MFA workflow (optional enrollment)
   - Session management via Redis

2. Component Breakdown
   - Auth service (JWT generation, validation, refresh)
   - OAuth2 client (Google integration)
   - MFA service (TOTP, backup codes)
   - Session manager (Redis-backed)
   - Middleware (route protection)

3. API Contracts
   - POST /auth/login (email/password → JWT)
   - POST /auth/oauth/google (OAuth flow)
   - POST /auth/refresh (refresh token → new access token)
   - POST /auth/mfa/enroll (optional MFA setup)
   - POST /auth/mfa/verify (TOTP verification)

4. Data Models
   - User schema (id, email, password_hash, mfa_enabled, mfa_secret)
   - RefreshToken schema (token_id, user_id, expires_at, revoked)
   - Session schema in Redis (session_id, user_id, metadata, ttl)

5. Edge Cases & Error Handling
   - Expired tokens, revoked sessions, concurrent logins
   - OAuth failures (Google API down, user cancels)
   - MFA lockout scenarios, backup codes
   - Rate limiting for login attempts

6. Security Architecture
   - Password hashing (bcrypt/argon2)
   - Token signing algorithm (RS256)
   - HTTPS-only cookies for refresh tokens
   - CSRF protection for OAuth callbacks
   - Rate limiting and brute-force protection

7. Testing Strategy
   - Unit: Token generation/validation, password hashing
   - Integration: OAuth flow, MFA enrollment
   - E2E: Full login/logout cycles, session persistence

8. Implementation Sequence
   1. Core JWT service (generation, validation)
   2. User model + password hashing
   3. Login/logout endpoints
   4. Refresh token mechanism
   5. Google OAuth integration
   6. Optional MFA (last, can be skipped for MVP)

Format: Structured markdown with Mermaid diagrams for flows

Constraints:
- Use existing Redis instance (connection details in .env)
- Follow existing API route structure (/api/v1/*)
- Maintain compatibility with current User model
- JWT secret must be environment variable
- Optimize for: Security first, then developer experience

DO NOT include preambles or status updates. Proceed directly to architectural design.

Reasoning Effort: high (authentication is complex, security-critical)

Phase 1: Validation

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --check
ralph preflight

Verifies:

ralph CLI installed and in PATH
Current directory is a git repository
At least one commit exists
ralph preflight validates configuration, backend availability, and project structure

Phase 2: Planning (PDD)

Option A: Interactive planning session

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --plan "Add JWT authentication"

Creates:

specs/ directory with requirements.md, design.md, implementation-plan.md
PROMPT.md referencing the specs

Option B: Generate from tasks (skip planning)

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --generate \
  --title "Add Authentication" \
  --tasks "Add NextAuth.js" "Create login page" "Add session provider"

Phase 3: Execution

Standard (sequential):

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --run

Or combined:

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --plan "description" --run
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --generate --title "..." --tasks "..." --run

Parallel execution (when PROMPT.md contains task waves):

If PROMPT.md uses the wave format (see "Parallel Task Execution" section below), independent tasks within each wave run in parallel via git worktrees + Task subagents.

Phase 4: Report

Script automatically reports:

Exit code interpretation
Task completion summary
Any errors or partial progress

CLI Options

Option	Description
`--check`	Validate environment (ralph, git, etc.)
`--plan TEXT`	Run PDD planning session with description
`--generate`	Generate PROMPT.md from --title and --tasks
`--title TEXT`	Title for generated PROMPT.md
`--tasks TEXT...`	Task list for generated PROMPT.md
`--context TEXT`	Additional context for PROMPT.md
`--run`	Execute ralph run after plan/generate
`--max-iterations N`	Iteration limit (default: 50)
`--backend NAME`	Override backend (default: claude)
`--dry-run`	Preview without executing

Default Workflow: Multi-Model (Codex + Ralph + Exa/Ref)

When ralph-orchestrator skill is invoked, it executes the full multi-model cycle:

Prompt Refinement - Clarify user intent, refactor for optimal Codex ingestion
Codex Planning - GPT 5.3 High generates thorough architecture (single sequential call)
PROMPT Generation - Extract tasks from architecture (with wave annotations for parallelism)
Ralph Implementation - Claude Opus 4.6 executes via orchestrate.py; independent task waves run in parallel via worktree + Task subagents
Codex Review - GPT 5.3 High analyzes implementation for edge cases (single sequential call)
Issue Resolution - Parse review findings, fix independent issues in parallel via Task subagents; Exa/Ref research only if fix isn't obvious from review
Report - Deliver results with review findings

No decision tree - this is the complete workflow every time.

Exit Codes

Code	Meaning	Action
0	LOOP_COMPLETE	Success - all tasks done
1	Failure	Check .ralph/ logs for details
2	Limit exceeded	Partial progress, may resume
130	Interrupted	User cancelled

Error Handling

Error	Cause	Action
`ralph not found`	CLI not installed	`npm install -g @ralph-orchestrator/ralph-cli`
`Not a git repository`	No .git directory	`git init && git add . && git commit -m "init"`
`No commits`	Empty git history	`git add . && git commit -m "Initial commit"`
`PROMPT.md exists`	Previous run	Use `--run` to resume or delete PROMPT.md

PROMPT.md Format

Generated PROMPT.md follows this structure:

# {title}

## Objective
{description from --plan or context}

## Tasks
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3

## Context
{additional context if provided}

## Constraints
- Follow existing code patterns
- Run tests after changes
- Commit after each task

## Acceptance Criteria
- All tasks completed
- Tests passing
- No linting errors

PROMPT.md Wave Format (Parallel Execution)

When Codex architecture identifies independent implementation steps, generate PROMPT.md with wave annotations. Tasks within the same wave have no dependencies on each other and run in parallel via worktrees.

# {title}

## Objective
{description}

## Tasks

### Wave 1 (parallel - no dependencies)
- [ ] Add authentication module (src/auth/)
- [ ] Add logging system (src/logging/)
- [ ] Add configuration loader (src/config/)

### Wave 2 (depends on Wave 1)
- [ ] Add authorization middleware (depends: authentication)
- [ ] Add audit logging (depends: logging)

### Wave 3 (depends on all above)
- [ ] Add integration tests

## Constraints
- Each module must have test coverage
- All tests must pass before merging waves

Wave detection: If PROMPT.md contains ### Wave N headings, Phase 3 uses parallel execution. Otherwise, standard sequential ralph execution.

When to use waves:

≥2 tasks touch entirely different directories/modules
Tasks have no shared file dependencies
Each task is substantial enough to justify worktree overhead (≥5 ralph iterations expected)

When NOT to use waves (keep sequential):

Tasks edit overlapping files
Tasks have tight sequential dependencies
Small tasks (<5 iterations each, overhead > benefit)
Single-task PROMPT.md

Examples

Example 1: Full PDD workflow

# Interactive planning + execution
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py \
  --plan "Add user authentication with OAuth2 (Google, GitHub)" \
  --run

Example 2: Quick task execution

# Skip planning, provide tasks directly
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py \
  --generate \
  --title "Add Input Validation" \
  --tasks "Add zod schema" "Validate /users endpoint" "Add error messages" \
  --run

Example 3: Dry run preview

# See what would be generated without executing
python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py \
  --generate \
  --title "Refactor Auth" \
  --tasks "Extract auth logic" "Add tests" \
  --dry-run

Example 4: Custom iteration limit

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py \
  --plan "Complex feature" \
  --run \
  --max-iterations 100

Example 5: Multi-model workflow with prompt refinement

User: "Add rate limiting to API endpoints"

# Phase 0: Prompt Refinement
Claude asks clarifying questions:
  1. Which rate limiting strategy? (token bucket, sliding window, fixed window)
  2. What backing store? (Redis, in-memory, database)
  3. Rate limit scope? (per-user, per-IP, per-API-key)

User answers:
  - Sliding window algorithm
  - Redis backing store
  - Per-API-key with fallback to per-IP

Claude refactors into optimal Codex prompt:
  Role: Lead Software Architect specializing in API Infrastructure
  Context: Express.js API, Redis available, ~10k requests/sec
  User Intent: "Add rate limiting with sliding window, Redis, per-API-key + per-IP fallback"
  [... full structured prompt with 8 output requirements ...]
  Reasoning effort: high (complex multi-tier rate limiting)

# Phase 1: Codex Planning (background execution)
Run: codex exec -o .ralph/codex-architecture.md -- "REFINED_PROMPT" 2>/dev/null & wait $!
Read: .ralph/codex-architecture.md
  → Contains architecture with sliding window implementation, Redis schema, middleware design

# Phase 2: PROMPT Generation
Generate PROMPT.md from Codex architecture with wave annotations:
  Wave 1 (parallel): sliding-window module, Redis store adapter, config loader
  Wave 2 (sequential): middleware integration, tests

# Phase 3: Ralph Implementation (parallel waves)
Wave 1: 3 Task subagents in worktrees (sliding-window, Redis adapter, config)
  → All 3 complete in ~40 min (vs ~90 min sequential)
Wave 2: sequential ralph run for middleware + tests
  → Merges wave 1 results first, then implements integration

# Phase 4: Codex Review (single sequential call, background)
Run: codex exec review --base HEAD~N > .ralph/codex-review.md 2>&1 & wait $!
Read: .ralph/codex-review.md
  → Finds: rate limit doesn't reset on restart, missing concurrency tests
  → Suggests: Redis persistence config, add concurrent request test

# Phase 5: Fix review issues (parallel - different files)
  2 issues in 2 files → 2 parallel Task subagents
  Subagent 1: Apply Redis persistence fix (clear suggestion, no Exa/Ref needed)
  Subagent 2: Add concurrency tests (clear suggestion, no Exa/Ref needed)
  Both complete in ~5 min

# Phase 6: Report
Complete implementation with review findings, all fixes applied

# Result: User's nuanced request preserved (sliding window, Redis, per-API-key)
# Parallel execution: ~65 min total (vs ~120 min sequential)
# Uses: 2 Codex requests (planning + review), parallel Claude for implementation + fixes

Example 6: Prompt refinement with preservation

User: "Add authentication with JWT, must support refresh tokens, Redis for session store"

# Phase 0: Prompt Refinement
Claude analyzes: All technical choices specified, no ambiguity
  ✓ Auth method: JWT (specified)
  ✓ Features: Refresh tokens (specified)
  ✓ Session store: Redis (specified)

No clarifying questions needed - user was explicit.

Claude refactors preserving ALL specifics:
  Role: Lead Software Architect specializing in Web Application Security
  User Intent (verbatim): "Add authentication with JWT, must support refresh tokens, Redis for session store"

  Output Requirements:
  1. System Design Overview
     - JWT-based authentication (access + refresh tokens as specified)
     - Redis session management (as specified)
     [...]

  Constraints:
  - MUST use JWT for authentication (user requirement)
  - MUST implement refresh token mechanism (user requirement)
  - MUST use Redis for session storage (user requirement)

  Reasoning effort: high (authentication is security-critical)

# Continues to Phase 1 (Codex Planning) with refined prompt
# User's technical choices (JWT, refresh tokens, Redis) are preserved and emphasized

Integration with Claude Code

Automatic multi-model workflow:

User discusses requirements with Claude Code
User: "Run ralph to implement this" or "Use ralph-orchestrator"
Claude invokes ralph-orchestrator skill
Skill executes automatically:
- Refines user request into optimal Codex prompt (asks clarifying questions if needed)
- Calls Codex for architectural planning with refined prompt (single sequential call)
- Generates PROMPT.md from architecture (with wave annotations for parallelism if applicable)
- Runs ralph implementation — independent task waves run in parallel via worktrees + Task subagents
- Calls Codex for code review (single sequential call)
- Fixes review issues — independent issues fixed in parallel via Task subagents, each driven by Codex review findings; Exa/Ref only if suggested fix is unclear
- Reports results with review findings
User receives complete implementation + quality analysis + working solutions

When to use ralph-orchestrator:

Well-defined feature implementation
Multiple related tasks
Want thorough architecture + fast implementation + comprehensive review
Can run unattended

When NOT to use:

Task requires clarification
Exploratory/research work
Single trivial change (< 5 lines)

Core Workflow: Multi-Model (Codex + Ralph)

Automatic execution: When ralph-orchestrator is invoked, this full workflow executes automatically.

Architecture: GPT 5.3 High for planning and code review + Claude Opus 4.6 for fast implementation.

Rationale: Developer consensus shows GPT 5.3 excels at "much more thorough" planning and finding "subtle edge cases" in review, while Claude Opus 4.6 excels at fast execution (7-8 min vs 20-26 min) and shipping working code.

Full Cycle: Refine → Codex Plan → Ralph Implement → Codex Review

Phase 0: Prompt Refinement

# Gather user's feature request
user_request = "Add user authentication with JWT and Google OAuth"

# Ask clarifying questions ONLY if unclear
if requires_clarification(user_request):
    answers = AskUserQuestion(questions=[...])
    user_request = incorporate_answers(user_request, answers)

# Detect domain and complexity
domain = detect_domain(user_request, codebase_summary)
constraints = extract_constraints(user_request, codebase_summary)
reasoning_effort = "high"  # Always high - set in ~/.codex/config.toml

# Refactor into optimal Codex prompt (preserving all user nuance)
refined_prompt = f"""Role: Lead Software Architect specializing in {domain}

Context:
- Project: {project_name}
- Tech Stack: {tech_stack}
- Constraints: {constraints}
- Existing Architecture: {codebase_summary}

User Intent (verbatim):
{user_request}

Task: Design the system architecture for: {extract_feature_summary(user_request)}

Output Requirements (Ralph TDD-Style Planning):

## Architecture Overview
1. System Design Summary
   - High-level architecture (Mermaid component diagram)
   - Data flow patterns
   - Integration points with existing systems
   - Technology choices with rationale

2. Component Breakdown
   - Core components with clear responsibilities
   - Component interactions and dependencies
   - API contracts (JSON schema or TypeScript types)
   - Data models with relationships and constraints

3. Risk Analysis & Edge Cases
   - Failure scenarios (network, data, auth, rate limits)
   - Security considerations (auth, encryption, attack surface)
   - Performance implications
   - Migration strategy if modifying existing schemas

## Implementation Plan (TDD Format)

For each implementation step, provide:

### Step N: [Clear Objective]

**Objective:** What we're building and why (one sentence)

**Implementation:**
- Specific files to create/modify
- Concrete code changes (functions, classes, interfaces)
- Configuration or schema updates
- Dependencies or libraries needed

**Test Requirements (DEFINE BEFORE IMPLEMENTING):**
- Unit tests: What specific behavior to test
- Integration tests: What interactions to verify
- Test fixtures or mocks needed
- Acceptance criteria (how to know it works)

**Integration:**
- How this step connects to other components
- What interfaces or contracts it depends on
- What it exposes for future steps

**Demo/Verification:**
- Command to run after implementing
- Expected output or behavior
- How to manually verify correctness

---

4. Implementation Checklist
   - [ ] Step 1: [Description]
   - [ ] Step 2: [Description]
   - [ ] Step 3: [Description]
   - ...

5. Step Dependencies
   - Mermaid flowchart showing which steps must complete before others
   - Identify parallel vs sequential work

6. Estimated Scope
   - Table with: Step | Files Modified | Complexity (Low/Medium/High)

## Ralph Guardrails (CRITICAL)

Your architecture must align with these principles:
- **Fresh context each iteration** - Each build step can be executed independently
- **Verification is mandatory** - Tests/typecheck/lint must pass before moving forward
- **Search before assuming** - Don't assume functionality is missing, verify first
- **Backpressure over prescription** - Design gates that reject bad work (tests fail = stop)
- **Disk is state** - Use files, git, and explicit artifacts for handoff between steps

Format: Structured markdown with Mermaid diagrams and JSON schemas where applicable.

Constraints:
- Follow existing codebase patterns: {existing_patterns}
- Maintain compatibility with: {dependencies}
- Optimize for: {performance_goals}
- {user_specific_constraints}

DO NOT include preambles, status updates, or meta-commentary. Proceed directly to architectural design.
"""

Phase 1: Codex Architectural Planning

# Call Codex CLI with refined prompt
# ALWAYS use reasoningEffort="high" for thorough architecture
# Write prompt to temp file to handle multi-line content
cat > /tmp/codex-prompt.txt << 'PROMPT_EOF'
{refined_prompt}
PROMPT_EOF

# Run codex in background to avoid token waste in main context
codex exec -o .ralph/codex-architecture.md -- "$(cat /tmp/codex-prompt.txt)" 2>/dev/null &
CODEX_PID=$!

# Wait for completion (or poll periodically)
wait $CODEX_PID

# Read architecture from file
# Use Read tool on .ralph/codex-architecture.md

Phase 2: Generate PROMPT.md from Codex Architecture

# Extract tasks from Codex architecture
tasks = extract_implementation_tasks(architecture)

# Analyze task dependencies from Codex architecture
# Group into waves: tasks touching different directories/modules with no shared files = same wave
# Tasks with explicit dependencies or shared files = later wave
waves = group_tasks_into_waves(tasks, architecture)

# Generate PROMPT.md with wave annotations if ≥2 independent tasks exist
Write tool:
  file_path: "PROMPT.md"
  content: f"""# {title}

## Objective
{objective}

## Architecture (from Codex GPT 5.3 High)

{architecture}

## Implementation Tasks
{format_tasks_with_waves(waves)}  # Uses ### Wave N format if parallel, flat list if sequential

## Constraints
- Follow the architectural design above
- Run tests after each component
- Commit after completing each task
- Use meaningful commit messages

## Acceptance Criteria
- All tasks completed per architecture
- Tests passing (unit + integration)
- No linting errors
- Security review passed
- Performance benchmarks met
"""

Phase 3: Execute Ralph Implementation

Sequential (no waves in PROMPT.md):

python ~/.claude/skills/ralph-orchestrator/scripts/orchestrate.py --run --max-iterations 100

Parallel (waves detected in PROMPT.md):

When PROMPT.md contains ### Wave N headings, execute independent tasks in parallel using git worktrees + Task subagents:

# Parse waves from PROMPT.md
waves = parse_waves_from_prompt_md("PROMPT.md")

for wave_idx, wave_tasks in enumerate(waves):
    print(f"Wave {wave_idx + 1}/{len(waves)}: {len(wave_tasks)} tasks")

    if len(wave_tasks) == 1:
        # Single task - run in main workspace (standard ralph)
        nohup ralph run -a --continue --max-iterations 30 > .ralph/ralph.log 2>&1 & disown
        # Monitor until completion
    else:
        # Multiple independent tasks - parallel worktrees + subagents
        for task in wave_tasks:
            # 1. Create isolated git worktree
            worktree_path = f".worktrees/wave{wave_idx + 1}-{task.id}"
            git worktree add {worktree_path} HEAD

            # 2. Write single-task PROMPT.md in worktree
            Write(file_path=f"{worktree_path}/PROMPT.md", content=single_task_prompt)

            # 3. Spawn Task subagent to run ralph in worktree
            Task(
                subagent_type="general-purpose",
                description=f"Ralph: {task.title}",
                prompt=f"""Execute ralph in isolated worktree for this task:

**Task:** {task.title}
**Description:** {task.description}
**Workspace:** {worktree_path}

1. Navigate: cd {worktree_path}
2. Verify PROMPT.md contains this single task
3. Run ralph detached:
   nohup ralph run -a --continue --max-iterations 30 > .ralph/ralph.log 2>&1 & disown
4. Monitor: tail -20 .ralph/ralph.log every 2-3 minutes
5. Wait for LOOP_COMPLETE or 60 minute timeout
6. Run tests: {test_command}

Output format:
STATUS: complete | partial | failed
ITERATIONS: [count]
COMMITS: [git log --oneline]
FILES: [modified files list]
TESTS: passing | failing
""",
                run_in_background=False
            )

        # After all wave tasks complete, merge worktrees back
        for task in completed_tasks:
            git merge {worktree_path}  # merge worktree branch into main
            git worktree remove {worktree_path}

    # Run full test suite after wave merge
    run_tests()

Worktree cleanup: Always remove worktrees after merge (git worktree remove), even on failure.

Merge conflicts: If merge fails, report conflicting files and fall back to sequential re-implementation of the conflicting task.

Phase 4: Codex Code Review

After LOOP_COMPLETE (exit code 0):

# Get base commit for review scope (adjust N based on implementation commits)
base_commit=$(git rev-parse HEAD~10)

# Run Codex review in background to avoid token waste
codex exec review --base "$base_commit" > .ralph/codex-review.md 2>&1 &
CODEX_PID=$!

# Wait for completion
wait $CODEX_PID

# Read review from file
# Use Read tool on .ralph/codex-review.md

Review output includes:

Subtle edge cases or bugs - Does the code handle all scenarios?
Error handling gaps - Missing try/catch, unhandled errors?
Security vulnerabilities - Injection, XSS, auth bypass, data leaks?
Performance issues - N+1 queries, memory leaks, blocking operations?
Maintainability concerns - Code smells, tight coupling, poor naming?
Test coverage gaps - Missing test cases, untested paths?
Documentation needs - Unclear APIs, missing comments for complex logic?
Architectural deviations - Did implementation follow the plan?

Codex CLI Reference (v0.98.0)

ALWAYS use Codex CLI directly (no MCP). This avoids timeout issues and provides more control.

If unsure about available options, run codex exec --help or codex exec review --help first.

CRITICAL: Required config.toml settings

Codex config at ~/.codex/config.toml MUST have:

model = "gpt-5.3-codex"
model_reasoning_effort = "high"

These are inherited by all codex exec and codex exec review calls. No need to pass on every invocation.

CRITICAL: Non-Interactive Execution

Always use codex exec (not bare codex) for non-interactive use. Use -o FILE to capture the final response to a file instead of stdout.

# ✓ CORRECT - Non-interactive with file output, background execution
codex exec -o .ralph/codex-output.md -- "prompt" 2>/dev/null &
wait $!
# Then use Read tool on .ralph/codex-output.md

# ❌ WRONG - bare `codex` opens interactive TUI
codex "prompt"

# ❌ WRONG - `-q` flag does not exist
codex -q "prompt"

# ❌ WRONG - `reasoningEffort` is not a valid config key
codex exec -c 'reasoningEffort="high"' -- "prompt"
# ✓ CORRECT config key (if you need to override config.toml)
codex exec -c model_reasoning_effort="high" -- "prompt"

CLI Syntax:

# Planning/prompts - use `codex exec` for non-interactive
codex exec -o .ralph/codex-plan.md -- "Your prompt here" 2>/dev/null & wait $!

# Code review - use `codex exec review` for non-interactive (no -o, use redirect)
codex exec review --base HEAD~10 > .ralph/codex-review.md 2>&1 & wait $!

# Review uncommitted changes
codex exec review --uncommitted > .ralph/codex-review.md 2>&1 & wait $!

# Review specific commit
codex exec review --commit abc123 > .ralph/codex-review.md 2>&1 & wait $!

Available CLI Options:

Command	Option	Description
`codex exec`	`"prompt"`	Non-interactive prompt (positional arg)
`codex exec`	`-o FILE`	Write final response to file
`codex exec`	`-m MODEL`	Override model (default from config.toml)
`codex exec`	`-c key=value`	Config override (e.g., `model_reasoning_effort="high"`)
`codex exec`	`-s SANDBOX`	Sandbox mode: `read-only`, `workspace-write`, `danger-full-access`
`codex exec`	`--full-auto`	Convenience: `-a on-request --sandbox workspace-write`
`codex exec`	`-C DIR`	Working directory
`codex exec review`	`--base <ref>`	Review changes against base branch/commit
`codex exec review`	`--commit <sha>`	Review specific commit
`codex exec review`	`--uncommitted`	Review staged/unstaged changes
`codex exec review`	`--title <text>`	Title for review summary
`codex exec review`	`-m MODEL`	Override model

NOTE: codex exec review does NOT have -o. Use shell redirect: > file 2>&1

Multi-line prompts (use temp file or stdin):

# Option A: Temp file with command substitution
cat > /tmp/codex-prompt.txt << 'EOF'
Your multi-line
prompt here
EOF
codex exec -o .ralph/codex-output.md -- "$(cat /tmp/codex-prompt.txt)" 2>/dev/null & wait $!

# Option B: Stdin (use - as prompt arg)
codex exec -o .ralph/codex-output.md -- - < /tmp/codex-prompt.txt 2>/dev/null & wait $!

Common errors:

Error	Cause	Fix
`unexpected argument '-q'`	`-q` flag doesn't exist	Use `codex exec "prompt"`
`unknown config key 'reasoningEffort'`	Wrong key name	Use `model_reasoning_effort`
TUI opens instead of running	Used bare `codex`	Use `codex exec`
Empty output file	Model config missing	Check `~/.codex/config.toml` has model + reasoning

Phase 5: Issue Resolution (parallel fixes from Codex review)

After Codex review completes, parse findings and fix issues. Independent issues (different files) run in parallel via Task subagents. Each subagent receives the full Codex review finding for its issue.

Step 1: Parse and categorize review findings

# Read Codex review output
review = Read(file_path=".ralph/codex-review.md")

# Parse findings into structured issues
# Each finding has: file, line, severity, category, description, suggested_fix
findings = parse_codex_review(review)

# Categorize by file independence
by_file = {}
for finding in findings:
    by_file.setdefault(finding.file, []).append(finding)

# Independent: files with single issue (can parallelize)
independent = [issues[0] for f, issues in by_file.items() if len(issues) == 1]
# Dependent: files with multiple issues (must serialize)
dependent = [issue for f, issues in by_file.items() if len(issues) > 1 for issue in issues]

Step 2: Spawn parallel fix subagents (≥2 independent issues)

if len(independent) >= 2:
    # Spawn parallel Task subagents
    for finding in independent:
        Task(
            subagent_type="general-purpose",
            description=f"Fix {finding.severity} in {finding.file}",
            prompt=f"""Fix this issue identified by Codex GPT 5.3 High code review.

## Codex Review Finding (AUTHORITATIVE - this is what needs fixing)

**File:** {finding.file}:{finding.line}
**Severity:** {finding.severity}
**Category:** {finding.category}
**Description:** {finding.description}
**Suggested Fix:** {finding.suggested_fix}

## Instructions

1. **Read the file:**
   Read(file_path="{finding.file}")

2. **Understand the issue:**
   - Locate line {finding.line} and surrounding context
   - Understand why the Codex review flagged this
   - Understand the suggested fix

3. **Apply the fix:**
   - Follow the Codex review's suggested fix as the primary approach
   - Use Edit tool to apply changes
   - Maintain existing code style and conventions

4. **If the suggested fix is unclear or insufficient:**
   - Search for working examples: mcp__exa__get_code_context_exa(query="{finding.category} {language} fix example")
   - Check official docs: mcp__Ref__ref_search_documentation(query="{finding.category} best practices")
   - Apply the pattern learned from examples/docs
   NOTE: Only use Exa/Ref if the Codex review's suggested fix is ambiguous or doesn't resolve the issue.

5. **Run tests:**
   Bash(command="{test_command}")

6. **Report result:**

If fixed:

STATUS: fixed FILE: {finding.file} CHANGES: [1-2 sentence description] TESTS: [passing count] RESEARCH_USED: none | exa | ref | both


If failed:

STATUS: failed FILE: {finding.file} ERROR: [what went wrong] BLOCKER: [why couldn't fix]

""",
            run_in_background=False
        )
else:
    # <2 independent issues - fix all sequentially in main thread
    fix_sequentially(findings)

Step 3: Fix dependent issues sequentially

# Issues in same file must be fixed sequentially to avoid conflicts
for finding in dependent:
    # Same fix logic as subagent prompt, but in main thread
    # Read file → apply Codex suggested fix → test
    # Escalate to Exa/Ref only if fix fails

Step 4: Run full test suite after all fixes

# Verify all fixes work together
{test_command}

Exa/Ref escalation (conditional):

Exa/Ref are NOT mandatory research steps
Use them ONLY when the Codex review's suggested fix is:
- Ambiguous (review says "consider using X" without specifics)
- Insufficient (applied fix but tests still fail)
- Missing (review identifies problem but doesn't suggest solution)
When escalating:
- Exa: mcp__exa__get_code_context_exa(query="working example {pattern} {language}") for battle-tested code
- Ref: mcp__Ref__ref_search_documentation(query="{framework} {api} documentation") for canonical docs

Example parallel fix execution:

Codex review finds 4 issues:
  1. auth.ts:47 - SQL injection (CRITICAL) - "Use parameterized queries"
  2. api.ts:89 - Missing rate limit (HIGH) - "Add express-rate-limit middleware"
  3. db.ts:23 - N+1 query (HIGH) - "Use DataLoader or eager loading"
  4. utils.ts:12 - Logging leak (MEDIUM) - "Redact PII fields before logging"

All in different files → 4 parallel Task subagents

Subagent 1: Read auth.ts → apply parameterized queries (clear fix, no Exa/Ref needed) → test ✓
Subagent 2: Read api.ts → apply rate limit (unclear middleware setup) → Ref lookup → apply → test ✓
Subagent 3: Read db.ts → apply DataLoader (need example) → Exa lookup → apply → test ✓
Subagent 4: Read utils.ts → add PII redaction (clear fix) → test ✓

Duration: ~8 min parallel (vs ~25 min sequential)
Exa/Ref used: 2 of 4 subagents (only where needed)

Rate Limit Strategy

Assumptions:

Claude Max $100/month → High Opus 4.6 limits (abundant)
Codex $20/month → Medium GPT 5.3 High limits (~50 requests/day)

Usage per feature:

Planning: 1 Codex request
Review: 1 Codex request
Implementation: Unlimited Claude (ralph run)

Result: ~25 plan+implement+review cycles per day

What Happens Automatically

Prompt Refinement Phase (always executed):

Analyzes user's feature request for clarity
Asks clarifying questions only if genuinely unclear
Refactors user request into optimal Codex planning prompt
Preserves all user intent, nuance, and technical choices
Selects appropriate reasoning effort (medium/high/xhigh)

Codex Planning Phase (always executed):

Receives refined, optimized prompt
Analyzes feature requirements with appropriate reasoning depth
Designs system architecture using systems-first approach
Identifies components and interactions
Plans implementation sequence
Considers edge cases and security

Ralph Implementation Phase (always executed):

Claude Opus 4.6 executes via orchestrate.py
Fast iteration (7-8 min typical)
Ships working code consistently
Commits after each task
If PROMPT.md has wave format: Independent task waves run in parallel via git worktrees + Task subagents (30-50% speedup on implementation phase)

Codex Review Phase (always executed, single sequential call):

Analyzes implementation for edge cases
Checks security vulnerabilities
Validates error handling
Reviews test coverage
Assesses maintainability
Provides severity-rated findings

Issue Resolution Phase (always executed after review):

Parses Codex review findings into structured issues
Categorizes by file independence (same file = sequential, different files = parallel)
≥2 independent issues: Spawns parallel Task subagents, each receiving full Codex review finding
Each subagent applies the Codex-suggested fix directly
Exa/Ref research is conditional - only used when Codex suggested fix is ambiguous, insufficient, or missing
Runs full test suite after all fixes applied

Example: Full Multi-Model Cycle

# 1. User describes feature
User: "Add rate limiting to our API endpoints"

# 2. Claude calls Codex CLI for architecture (background, file output)
mkdir -p .ralph
codex exec -o .ralph/codex-architecture.md -- "ARCH_PROMPT" 2>/dev/null & wait $!
# Read .ralph/codex-architecture.md
# Contains: Token bucket algorithm, Redis backing store, middleware pattern,
# rate limit configs, error responses, monitoring

# 3. Claude generates PROMPT.md with architecture + tasks (wave format if applicable)
# If Codex architecture shows independent modules:
#   Wave 1: Rate limit middleware, Redis store, Config loader (parallel)
#   Wave 2: Integration + monitoring (depends on Wave 1)
# Otherwise: flat task list (sequential)

# 4. Claude executes Ralph
# Sequential: python orchestrate.py --run --max-iterations 100
# Parallel waves: spawn Task subagents per wave with worktrees
# Wave 1: 3 parallel worktrees (middleware, store, config)
# Wave 2: sequential (depends on wave 1 merge)

# 5. Ralph completes (LOOP_COMPLETE)
Exit code 0, all tasks done

# 6. Claude calls Codex CLI for review (single sequential call, background)
codex exec review --base HEAD~10 > .ralph/codex-review.md 2>&1 & wait $!
# Read .ralph/codex-review.md
# Contains: Found edge case - rate limit doesn't reset on server restart,
# Missing test for concurrent requests, Redis connection error handling needed

# 7. Fix issues from review (parallel if in different files)
# Parse review: 3 issues in 3 different files → spawn 3 parallel fix subagents
# Subagent 1: Fix rate limit persistence (Codex suggested Redis SAVE config) → apply directly
# Subagent 2: Add concurrency tests (Codex suggested test patterns) → apply directly
# Subagent 3: Fix Redis error handling (ambiguous suggestion) → Ref lookup → apply
# All 3 complete in ~8 min vs ~20 min sequential

# 8. Run full test suite → all passing
# Report: "Fixed 3 review issues (2 direct, 1 with Ref docs). All tests passing."

Execution from Claude Code

CRITICAL: Timeout Behavior

Claude Code's Bash tool has timeouts that WILL kill long-running processes:

Method	Timeout	Use Case
`Bash` tool (default)	5 min	Quick commands
`Bash` tool with `timeout: 600000`	10 min	Medium tasks
`run_in_background: true`	Still has timeout	NOT suitable for ralph loops
`nohup ... & disown`	NO timeout	Required for ralph loops

Running Ralph Loops (Correct Way)

ALWAYS use nohup + disown for ralph loops:

# Start ralph detached (survives timeouts and disconnects)
nohup ralph run -a --continue --max-iterations 100 > .ralph/ralph.log 2>&1 & disown
echo "PID: $!"

# Monitor progress
tail -f .ralph/ralph.log

❌ WRONG - Will get killed:

# These will timeout after 5-10 minutes
ralph run --max-iterations 100
python orchestrate.py --run --max-iterations 100

✓ CORRECT - Detached execution:

nohup ralph run -a --max-iterations 100 > .ralph/ralph.log 2>&1 & disown

Monitoring Detached Processes

# Watch live output
tail -f .ralph/ralph.log

# Check if still running
ps aux | grep ralph | grep -v grep

# Check latest output
tail -50 .ralph/ralph.log

# Check git progress (what's been committed)
git log --oneline -10

Stopping a Detached Process

# Find and kill
kill $(pgrep -f "ralph run")

# Or by PID if you saved it
kill <PID>

Troubleshooting

"Too many consecutive failures"

Symptom: Ralph terminates with "consecutive_failures" after several rapid iterations (~1 second each).

Cause: Claude API rate limiting between consecutive requests. Each iteration completes in ~1 second without actually invoking the Claude agent.

Root cause confirmed: When ralph runs multiple iterations back-to-back, the Claude API may reject rapid consecutive requests. Adding a 60-second delay between iterations resolves this.

Solution: Use the loop runner script with delays (see below).

Debug steps:

Run single iteration with verbose output:

nohup ralph run -a --continue --max-iterations 1 --verbose > .ralph/debug.log 2>&1 & disown
tail -f .ralph/debug.log

Check what was happening:

# Ralph's scratchpad shows current state
cat .ralph/agent/scratchpad.md

# Event history
ralph events

# Task list
ralph tools task ready

If debug iteration succeeds: The original failure was transient. Resume normally:

nohup ralph run -a --continue --max-iterations 100 > .ralph/ralph.log 2>&1 & disown

Output Truncation

❌ NEVER use | head -N when capturing ralph output:

# BAD - truncates output, can't see actual errors
ralph run --verbose 2>&1 | head -100

✓ ALWAYS capture full output to file:

# GOOD - full output preserved
nohup ralph run -a --verbose > .ralph/debug.log 2>&1 & disown

Resuming After Failure

# Continue from where it left off (preserves scratchpad, tasks, memories)
nohup ralph run -a --continue --max-iterations 100 > .ralph/ralph.log 2>&1 & disown

Checking Progress

# Completed tasks
ralph tools task list | grep -v "^ID"

# Git commits made
git log --oneline -10

# Current iteration state
cat .ralph/agent/scratchpad.md

# Ralph's learned patterns
cat .ralph/agent/memories.md

Environment Diagnostics

When troubleshooting, run ralph doctor first for a full environment health check:

ralph doctor

Reports backend availability, config validity, tool versions, and common misconfigurations.

Common Issues

Issue	Cause	Fix
Process killed mid-iteration	Claude Code timeout	Use `nohup ... & disown`
TUI errors in background	No terminal for TUI	Add `-a` (autonomous/headless mode)
"consecutive_failures"	API rate limiting	Use loop script with 60s+ delays
Intermittent failures even with delays	Accumulated rate limit state	Script auto-doubles delay after failures
Can't see errors	Output truncated	Don't use `
Loop stuck	Iteration taking too long	Check `tail -f .ralph/ralph.log`
Tasks not progressing	Previous iteration incomplete	`ralph run -a --continue`
Unknown environment issue	Misconfiguration	Run `ralph doctor` for diagnostics

Verifying Claude is Being Invoked

If iterations complete in <10 seconds without output, Claude may not be starting:

# Check process tree while ralph is running
pstree -p $(pgrep -f "ralph run") | head -20

# Should show: ralph → claude → (MCP servers, docker, etc.)
# If no claude subprocess, the API call is failing silently

Loop Runner Script (Recommended)

For reliable multi-iteration runs, use a loop script with delays between iterations:

#!/bin/bash
# scripts/ralph-loop.sh - Ralph loop with rate limit protection
# Usage: ./scripts/ralph-loop.sh [max_iterations] [delay_seconds]

MAX_ITERATIONS=${1:-20}
DELAY=${2:-60}  # 60 second delay to avoid API rate limits
LOG_DIR=".ralph/loop-logs"
CONSECUTIVE_FAILURES=0
MAX_CONSECUTIVE_FAILURES=3

mkdir -p "$LOG_DIR"

echo "=== Ralph Loop Runner ==="
echo "Max iterations: $MAX_ITERATIONS | Delay: ${DELAY}s"

for i in $(seq 1 $MAX_ITERATIONS); do
    TIMESTAMP=$(date +%Y%m%d-%H%M%S)
    LOG_FILE="$LOG_DIR/iteration-$i-$TIMESTAMP.log"

    echo "[$TIMESTAMP] Starting iteration $i/$MAX_ITERATIONS..."
    rm -f .ralph/loop.lock

    START=$(date +%s)
    ralph run -a --continue --max-iterations 1 > "$LOG_FILE" 2>&1
    DURATION=$(($(date +%s) - START))

    if [ $DURATION -lt 10 ]; then
        echo "  ⚠ Iteration completed too quickly (${DURATION}s) - likely failed"
        CONSECUTIVE_FAILURES=$((CONSECUTIVE_FAILURES + 1))

        if [ $CONSECUTIVE_FAILURES -ge $MAX_CONSECUTIVE_FAILURES ]; then
            echo "  ✗ $MAX_CONSECUTIVE_FAILURES consecutive failures - doubling delay"
            DELAY=$((DELAY * 2))
            CONSECUTIVE_FAILURES=0
        fi
    else
        echo "  ✓ Iteration completed in ${DURATION}s"
        CONSECUTIVE_FAILURES=0
        git log --oneline -1
    fi

    # Check for completion
    grep -q "LOOP_COMPLETE" "$LOG_FILE" 2>/dev/null && echo "=== LOOP_COMPLETE ===" && break

    [ $i -lt $MAX_ITERATIONS ] && echo "  Waiting ${DELAY}s..." && sleep $DELAY
done

echo -e "\n=== Summary ===" && git log --oneline -10

Usage:

chmod +x scripts/ralph-loop.sh
nohup ./scripts/ralph-loop.sh 20 60 > .ralph/loop-runner.log 2>&1 & disown
tail -f .ralph/loop-runner.log  # Monitor progress

Why 60 seconds? The Claude API rate limits consecutive requests. Testing shows 60s is sufficient cooldown, but the script auto-doubles delay after 3 consecutive failures.

Observed iteration times:

Failed iterations: 0-1 seconds (API rejected)
Successful iterations: 3-8 minutes (actual work)

Dependencies

ralph-orchestrator CLI: npm install -g @ralph-orchestrator/ralph-cli
codex CLI: Required for multi-model workflow (planning + review)
Python 3.8+
Git repository with commits