skills/adaptationio/skrillz/end-to-end-orchestrator

end-to-end-orchestrator

SKILL.md

End-to-End Orchestrator

Overview

end-to-end-orchestrator provides single-command complete development workflows, coordinating all 5 multi-ai skills from research through production deployment.

Purpose: Transform "I want feature X" into production-ready code through automated skill coordination

Pattern: Workflow-based (5-stage pipeline with quality gates)

Key Innovation: Automatic orchestration of research → planning → implementation → testing → verification with failure recovery and quality gates

The Complete Pipeline:

Input: Feature description
1. Research (multi-ai-research) [optional]
  ↓ [Quality Gate: Research complete]
2. Planning (multi-ai-planning)
  ↓ [Quality Gate: Plan ≥90/100]
3. Implementation (multi-ai-implementation)
  ↓ [Quality Gate: Tests pass, coverage ≥80%]
4. Testing (multi-ai-testing)
  ↓ [Quality Gate: Coverage ≥95%, verified]
5. Verification (multi-ai-verification)
  ↓ [Quality Gate: Score ≥90/100, all layers pass]
Output: Production-ready code

When to Use

Use end-to-end-orchestrator when:

  • Implementing complete features (not quick fixes)
  • Want automated workflow (not manual skill chaining)
  • Production-quality required (all gates must pass)
  • Time optimization important (parallel where possible)
  • Need failure recovery (automatic retry/rollback)

When NOT to Use:

  • Quick fixes (<30 minutes)
  • Exploratory work (uncertain requirements)
  • Manual control preferred (step through each phase)

Prerequisites

Required

  • All 5 multi-ai skills installed:
    • multi-ai-research
    • multi-ai-planning
    • multi-ai-implementation
    • multi-ai-testing
    • multi-ai-verification

Optional

  • agent-memory-system (for learning from past work)
  • hooks-manager (for automation)
  • Gemini CLI, Codex CLI (for tri-AI research)

Complete Workflow

Stage 1: Research (Optional)

Purpose: Ground implementation in proven patterns

Process:

  1. Determine if Research Needed:

    // Check if objective is familiar
    const similarWork = await recallMemory({
      type: 'episodic',
      query: objective
    });
    
    if (similarWork.length === 0) {
      // Unfamiliar domain → research needed
      needsResearch = true;
    } else {
      // Familiar → can skip research, use past learnings
      needsResearch = false;
    }
    
  2. Execute Research (if needed):

    Use multi-ai-research for "[domain] implementation patterns and best practices"
    

    What It Provides:

    • Claude research: Official docs, codebase patterns
    • Gemini research: Web best practices, latest trends
    • Codex research: GitHub patterns, code examples
    • Quality: ≥95/100 with 100% citations
  3. Quality Gate: Research Complete:

    ✅ Research findings documented
    ✅ Patterns identified (minimum 2)
    ✅ Best practices extracted (minimum 3)
    ✅ Quality score ≥95/100
    

    If Fail: Research incomplete → retry research OR proceed without (user decides)

Outputs:

  • Research findings (.analysis/ANALYSIS_FINAL.md)
  • Patterns and best practices
  • Implementation recommendations

Time: 30-60 minutes (can skip if familiar domain)

Next: Proceed to Stage 2


Stage 2: Planning

Purpose: Create agent-executable plan with quality ≥90/100

Process:

  1. Load Research Context (if research done):

    let context = "";
    
    if (researchDone) {
      context = await readFile('.analysis/ANALYSIS_FINAL.md');
    }
    
  2. Invoke Planning:

    Use multi-ai-planning to create plan for [objective]
    
    ${context ? `Research findings available in: .analysis/ANALYSIS_FINAL.md` : ''}
    
    Create comprehensive plan following 6-step workflow.
    

    What It Does:

    • Analyzes objective
    • Hierarchical decomposition (8-15 tasks)
    • Maps dependencies, identifies parallel
    • Plans verification for all tasks
    • Scores quality (0-100)
  3. Quality Gate: Plan Approved:

    ✅ Plan created
    ✅ Quality score ≥90/100
    ✅ All tasks have verification
    ✅ Dependencies mapped
    ✅ No circular dependencies
    

    If Fail (score <90):

    • Review gap analysis
    • Apply recommended fixes
    • Re-verify
    • Retry up to 2 times
    • If still <90: Escalate to human review
  4. Save Plan to Shared State:

    # Save for next stage
    cp plans/[plan-id]/plan.json .multi-ai-context/plan.json
    

Outputs:

  • plan.json (machine-readable)
  • PLAN.md (human-readable)
  • COORDINATION.md (execution guide)
  • Quality ≥90/100

Time: 1.5-3 hours

Next: Proceed to Stage 3


Stage 3: Implementation

Purpose: Execute plan with TDD, produce working code

Process:

  1. Load Plan:

    const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
    console.log(`📋 Loaded plan: ${plan.objective}`);
    console.log(`   Tasks: ${plan.tasks.length}`);
    console.log(`   Estimated: ${plan.metadata.estimated_total_hours} hours`);
    
  2. Invoke Implementation:

    Use multi-ai-implementation following plan in .multi-ai-context/plan.json
    
    Execute all 6 steps:
    1. Explore & gather context
    2. Plan architecture (plan already created, refine as needed)
    3. Implement incrementally with TDD
    4. Coordinate multi-agent (if parallel tasks)
    5. Integration & E2E testing
    6. Quality verification before commit
    
    Success criteria from plan.
    

    What It Does:

    • Explores codebase (progressive disclosure)
    • Implements incrementally (<200 lines per commit)
    • Test-driven development (tests first)
    • Multi-agent coordination for parallel tasks
    • Continuous testing during implementation
    • Doom loop prevention (max 3 retries)
  3. Quality Gate: Implementation Complete:

    ✅ All plan tasks implemented
    ✅ All tests passing
    ✅ Coverage ≥80% (gate), ideally ≥95%
    ✅ No regressions
    ✅ Doom loop avoided (< max retries)
    

    If Fail:

    • Identify failing task
    • Retry with different approach
    • If 3 failures: Escalate to human
    • Save state for recovery
  4. Save Implementation State:

    # Save for next stage
    echo '{
      "status": "implemented",
      "files_changed": [...],
      "tests_run": 95,
      "tests_passed": 95,
      "coverage": 87,
      "commits": ["abc123", "def456"]
    }' > .multi-ai-context/implementation-status.json
    

Outputs:

  • Working code
  • Tests passing
  • Coverage ≥80%
  • Commits created

Time: 3-10 hours (varies by complexity)

Next: Proceed to Stage 4


Stage 4: Testing (Independent Verification)

Purpose: Verify tests are comprehensive and prevent gaming

Process:

  1. Load Implementation Context:

    const implStatus = JSON.parse(
      readFile('.multi-ai-context/implementation-status.json')
    );
    
    console.log(`🧪 Testing implementation:`);
    console.log(`   Files changed: ${implStatus.files_changed.length}`);
    console.log(`   Current coverage: ${implStatus.coverage}%`);
    
  2. Invoke Independent Testing:

    Use multi-ai-testing independent verification workflow
    
    Verify:
    - Tests in: tests/
    - Code in: src/
    - Specifications in: .multi-ai-context/plan.json
    
    Workflows to execute:
    1. Test quality verification (independent agent)
    2. Coverage validation (≥95% target)
    3. Edge case discovery (AI-powered)
    4. Multi-agent ensemble scoring (if critical feature)
    
    Score test quality (0-100).
    

    What It Does:

    • Independent verification (separate agent from impl)
    • Checks tests match specifications (not just what code does)
    • Generates additional edge case tests
    • Multi-agent ensemble for quality scoring
    • Prevents overfitting
  3. Quality Gate: Testing Verified:

    ✅ Test quality score ≥90/100
    ✅ Coverage ≥95% (target achieved)
    ✅ Independent verification passed
    ✅ No test gaming detected
    ✅ Edge cases covered
    

    If Fail:

    • Review test quality issues
    • Generate additional tests
    • Re-verify
    • Max 2 retries, then escalate
  4. Save Testing State:

    echo '{
      "status": "tested",
      "test_quality_score": 92,
      "coverage": 96,
      "tests_total": 112,
      "edge_cases": 23,
      "gaming_detected": false
    }' > .multi-ai-context/testing-status.json
    

Outputs:

  • Test quality ≥90/100
  • Coverage ≥95%
  • Independent verification passed

Time: 1-3 hours

Next: Proceed to Stage 5


Stage 5: Verification (Multi-Layer QA)

Purpose: Final quality assurance before production

Process:

  1. Load All Context:

    const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
    const implStatus = JSON.parse(readFile('.multi-ai-context/implementation-status.json'));
    const testStatus = JSON.parse(readFile('.multi-ai-context/testing-status.json'));
    
    console.log(`🔍 Final verification:`);
    console.log(`   Objective: ${plan.objective}`);
    console.log(`   Implementation: ${implStatus.status}`);
    console.log(`   Testing: ${testStatus.coverage}% coverage`);
    
  2. Invoke Multi-Layer Verification:

    Use multi-ai-verification for complete quality check
    
    Verify:
    - Code: src/
    - Tests: tests/
    - Plan: .multi-ai-context/plan.json
    
    Execute all 5 layers:
    1. Rules-based (linting, types, schema, SAST)
    2. Functional (tests, coverage, examples)
    3. Visual (if UI: screenshots, a11y)
    4. Integration (E2E, API compatibility)
    5. Quality scoring (LLM-as-judge, 0-100)
    
    All 5 quality gates must pass.
    

    What It Does:

    • Runs all 5 verification layers
    • Each layer is independent
    • LLM-as-judge for holistic assessment
    • Agent-as-a-Judge can execute tools to verify claims
    • Multi-agent ensemble for critical features
  3. Quality Gate: Production Ready:

    ✅ Layer 1 (Rules): PASS
    ✅ Layer 2 (Functional): PASS, coverage 96%
    ✅ Layer 3 (Visual): PASS or SKIPPED
    ✅ Layer 4 (Integration): PASS
    ✅ Layer 5 (Quality): 92/100 ≥90 ✅
    
    ALL GATES PASSED → PRODUCTION APPROVED
    

    If Fail:

    • Review gap analysis from failed layer
    • Apply recommended fixes
    • Re-verify from failed layer (not all 5)
    • Max 2 retries per layer
    • If still failing: Escalate to human
  4. Generate Final Report:

    # Feature Implementation Complete
    
    **Objective**: [from plan]
    
    ## Pipeline Execution Summary
    
    ### Stage 1: Research
    - Status: ✅ Complete
    - Quality: 97/100
    - Time: 52 minutes
    
    ### Stage 2: Planning
    - Status: ✅ Complete
    - Quality: 94/100
    - Tasks: 23
    - Time: 1.8 hours
    
    ### Stage 3: Implementation
    - Status: ✅ Complete
    - Files changed: 15
    - Lines added: 847
    - Commits: 12
    - Time: 6.2 hours
    
    ### Stage 4: Testing
    - Status: ✅ Complete
    - Test quality: 92/100
    - Coverage: 96%
    - Tests: 112
    - Time: 1.5 hours
    
    ### Stage 5: Verification
    - Status: ✅ Complete
    - Quality score: 92/100
    - All layers: PASS
    - Time: 1.2 hours
    
    ## Final Metrics
    
    - **Total Time**: 11.3 hours
    - **Quality**: 92/100
    - **Coverage**: 96%
    - **Status**: ✅ PRODUCTION READY
    
    ## Commits
    - abc123: feat: Add database schema
    - def456: feat: Implement OAuth integration
    - [... 10 more ...]
    
    ## Next Steps
    - Create PR for team review
    - Deploy to staging
    - Production release
    
  5. Save to Memory (if agent-memory-system available):

    await storeMemory({
      type: 'episodic',
      event: {
        description: `Complete implementation: ${objective}`,
        outcomes: {
          total_time: 11.3,
          quality_score: 92,
          test_coverage: 96,
          stages_completed: 5
        },
        learnings: extractedDuringPipeline
      }
    });
    

Outputs:

  • Production-ready code
  • Comprehensive final report
  • Commits created
  • PR ready (if requested)
  • Memory saved for future learning

Time: 30-90 minutes

Result: ✅ PRODUCTION READY


Failure Recovery

Failure Handling at Each Stage

Stage FailsRecovery Strategy:

Research Fails:

  • Retry with different sources
  • Skip research (use memory if available)
  • Escalate to human if critical gap

Planning Fails (score <90):

  • Review gap analysis
  • Apply fixes automatically if possible
  • Retry planning (max 2 attempts)
  • Escalate if still <90

Implementation Fails:

  • Identify failing task
  • Automatic rollback to last checkpoint
  • Retry with alternative approach
  • Doom loop prevention (max 3 retries)
  • Escalate with full error context

Testing Fails (coverage <80% or quality <90):

  • Generate additional tests for gaps
  • Retry verification
  • Max 2 retries
  • Escalate with coverage report

Verification Fails (score <90 or layer fails):

  • Apply auto-fixes for Layer 1-2 issues
  • Manual fixes needed for Layer 3-5
  • Re-verify from failed layer (not all 5)
  • Max 2 retries per layer
  • Escalate with quality report

Escalation Protocol

When to Escalate to Human:

  1. Any stage fails 3 times (doom loop)
  2. Planning quality <80 after 2 retries
  3. Implementation doom loop detected
  4. Verification score <80 after 2 retries
  5. Budget exceeded (if cost tracking enabled)
  6. Circular dependency detected
  7. Irrecoverable error (file system, permissions)

Escalation Format:

# ⚠️ ESCALATION REQUIRED

**Stage**: Implementation (Stage 3)
**Failure**: Doom loop detected (3 failed attempts)

## Context
- Objective: Implement user authentication
- Failing Task: 2.2.2 Token generation
- Error: Tests fail with "undefined userId" repeatedly

## Attempts Made
1. Attempt 1: Added userId to payload → Same error
2. Attempt 2: Changed payload structure → Same error
3. Attempt 3: Different JWT library → Same error

## Root Cause Analysis
- Tests expect `user.id` but implementation uses `user.userId`
- Mismatch in data model between test and implementation
- Auto-fix failed 3 times

## Recommended Actions
1. Review test specifications vs. implementation
2. Align data model (user.id vs. user.userId)
3. Manual intervention required

## State Saved
- Checkpoint: checkpoint-003 (before attempts)
- Rollback available: `git checkout checkpoint-003`
- Continue after fix: Resume from Task 2.2.2

Parallel Execution Optimization

Identifying Parallel Opportunities

From Plan:

const plan = readFile('.multi-ai-context/plan.json');

// Plan identifies parallel groups
const parallelGroups = plan.parallel_groups;

// Example:
// Group 1: Tasks 2.1, 2.2, 2.3 (independent)
// Can execute in parallel

Executing Parallel Tasks

Pattern:

// Stage 3: Implementation with parallel tasks

const parallelGroup = plan.parallel_groups.find(g => g.group_id === 'pg2');

// Spawn parallel implementation agents
const results = await Promise.all(
  parallelGroup.tasks.map(taskId => {
    const task = plan.tasks.find(t => t.id === taskId);

    return task({
      description: `Implement ${task.description}`,
      prompt: `Implement task ${task.id}: ${task.description}

      Specifications from plan:
      ${JSON.stringify(task, null, 2)}

      Success criteria:
      ${task.verification.success_criteria.join('\n')}

      Write implementation and tests.
      Report completion status.`
    });
  })
);

// Verify all parallel tasks completed
const allSucceeded = results.every(r => r.status === 'complete');

if (allSucceeded) {
  // Proceed to integration
} else {
  // Handle failures
}

Time Savings: 20-40% faster than sequential execution


State Management

Cross-Skill State Sharing

Shared Context Directory: .multi-ai-context/

Standard Files:

.multi-ai-context/
├── research-findings.json     # From multi-ai-research
├── plan.json                  # From multi-ai-planning
├── implementation-status.json # From multi-ai-implementation
├── testing-status.json        # From multi-ai-testing
├── verification-report.json   # From multi-ai-verification
├── pipeline-state.json        # Orchestrator state
└── failure-history.json       # For doom loop detection

Benefits:

  • Skills don't duplicate work
  • Later stages read earlier outputs
  • Failure recovery knows full state
  • Memory can be saved from shared state

Progress Tracking

Real-Time Progress:

{
  "pipeline_id": "pipeline_20250126_1200",
  "objective": "Implement user authentication",
  "started_at": "2025-01-26T12:00:00Z",
  "current_stage": 3,
  "stages": [
    {
      "stage": 1,
      "name": "Research",
      "status": "complete",
      "duration_minutes": 52,
      "quality": 97
    },
    {
      "stage": 2,
      "name": "Planning",
      "status": "complete",
      "duration_minutes": 108,
      "quality": 94
    },
    {
      "stage": 3,
      "name": "Implementation",
      "status": "in_progress",
      "started_at": "2025-01-26T13:48:00Z",
      "tasks_total": 23,
      "tasks_complete": 15,
      "tasks_remaining": 8,
      "percent_complete": 65
    },
    {
      "stage": 4,
      "name": "Testing",
      "status": "pending"
    },
    {
      "stage": 5,
      "name": "Verification",
      "status": "pending"
    }
  ],
  "estimated_completion": "2025-01-26T20:00:00Z",
  "quality_target": 90,
  "current_quality_estimate": 92
}

Query Progress:

# Check current status
cat .multi-ai-context/pipeline-state.json | jq '.current_stage, .stages[2].percent_complete'

# Output: Stage 3, 65% complete

Workflow Modes

Standard Mode (Full Pipeline)

All 5 Stages:

Research → Planning → Implementation → Testing → Verification

Time: 8-20 hours Quality: Maximum (all gates, ≥90) Use For: Production features, complex implementations


Fast Mode (Skip Research)

4 Stages (familiar domains):

Planning → Implementation → Testing → Verification

Time: 6-15 hours Quality: High (all gates except research) Use For: Familiar domains, time-sensitive features


Quick Mode (Essential Gates Only)

Implementation + Basic Verification:

Planning → Implementation → Testing (basic) → Verification (Layers 1-2 only)

Time: 3-8 hours Quality: Good (essential gates only) Use For: Internal tools, prototypes


Best Practices

1. Always Run Planning Stage

Even for "simple" features - planning quality ≥90 prevents issues

2. Use Memory to Skip Research

If similar work done before, recall patterns instead of researching

3. Monitor Progress

Check .multi-ai-context/pipeline-state.json to track progress

4. Trust the Quality Gates

If gate fails, there's a real issue - don't skip fixes

5. Save State Frequently

Each stage completion saves state (enables recovery)

6. Review Final Report

Complete understanding of what was built and quality achieved


Integration Points

With All 5 Multi-AI Skills

Coordinates:

  1. multi-ai-research (Stage 1)
  2. multi-ai-planning (Stage 2)
  3. multi-ai-implementation (Stage 3)
  4. multi-ai-testing (Stage 4)
  5. multi-ai-verification (Stage 5)

Provides:

  • Automatic skill invocation
  • Quality gate enforcement
  • Failure recovery
  • State management
  • Progress tracking
  • Final reporting

With agent-memory-system

Before Pipeline:

  • Recall similar past work
  • Load learned patterns
  • Skip research if memory sufficient

After Pipeline:

  • Save complete episode to memory
  • Extract learnings
  • Update procedural patterns
  • Improve estimation accuracy

With hooks-manager

Session Hooks:

  • SessionStart: Load pipeline state
  • SessionEnd: Save pipeline progress
  • PostToolUse: Track stage completions

Notification Hooks:

  • Send telemetry on stage completions
  • Alert on gate failures
  • Track quality scores

Quick Reference

The 5-Stage Pipeline

Stage Skill Time Quality Gate Output
1 multi-ai-research 30-60m ≥95/100 Research findings
2 multi-ai-planning 1.5-3h ≥90/100 Executable plan
3 multi-ai-implementation 3-10h Tests pass, ≥80% cov Working code
4 multi-ai-testing 1-3h ≥95% cov, quality ≥90 Verified tests
5 multi-ai-verification 1-3h ≥90/100, all layers Production ready

Total: 8-20 hours → Production-ready feature

Workflow Modes

Mode Stages Time Quality Use For
Standard All 5 8-20h Maximum Production features
Fast 2-5 (skip research) 6-15h High Familiar domains
Quick 2,3,4,5 (basic) 3-8h Good Internal tools

Quality Gates

  • Research: ≥95/100, patterns identified
  • Planning: ≥90/100, all tasks verifiable
  • Implementation: Tests pass, coverage ≥80%
  • Testing: Quality ≥90/100, coverage ≥95%
  • Verification: ≥90/100, all 5 layers pass

end-to-end-orchestrator provides complete automation from feature description to production-ready code, coordinating all 5 multi-ai skills with quality gates, failure recovery, and state management - delivering enterprise-grade development workflows in a single command.

For examples, see examples/. For failure recovery, see Failure Recovery section.

Weekly Installs
1
Installed on
claude-code1