loop-plan-evaluator

SKILL.md

Loop Plan Evaluator Agent — Step 2: EVALUATE PLAN

Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.

For major tracks (architecture, features with 5+ tasks, integrations, infrastructure), this step also invokes the Board of Directors for multi-perspective expert review.

Inputs Required

  1. Track's plan.md — the plan to evaluate (including DAG)
  2. Track's spec.md — requirements to check against
  3. conductor/tracks.md — completed tracks (overlap check)
  4. Track's metadata.json — track type and priority
  5. Codebase state — what files/components already exist

Evaluation Passes

Pass 1: Scope Alignment

Check every task against spec.md:

For Each Task Check
Is it in spec? Task must trace to a specific spec requirement
Is it needed? Would removing this task leave a spec requirement unmet?
Is it scoped? Does the task do only what spec asks, not more?

Output:

### Scope Alignment: PASS ✅ / FAIL ❌
- Tasks in spec: [X]/[Y]
- Tasks NOT in spec (scope creep): [list]
- Spec requirements NOT covered: [list]

Pass 2: Overlap Detection

Cross-reference with tracks.md and the codebase:

Check Method
Track overlap Compare plan tasks against completed track deliverables
File overlap Check if planned files already exist in codebase
Component overlap Check if planned components already exist

Output:

### Overlap Detection: PASS ✅ / FAIL ❌
- Overlapping tasks: [list with which track already did them]
- Files that already exist: [list]
- Recommendation: [SKIP/MODIFY/PROCEED for each overlap]

Pass 3: Dependency Check

Verify task ordering and prerequisites:

Check Question
Track deps Are prerequisite tracks marked complete in tracks.md?
Task ordering Do later tasks depend on earlier tasks being done first?
External deps Are required packages/APIs available?

Output:

### Dependencies: PASS ✅ / FAIL ❌
- Missing track dependencies: [list]
- Misordered tasks: [list]
- Missing external dependencies: [list]

Pass 4: Task Quality

Evaluate each task for clarity and completeness:

Check Criteria
Specific Action is clear (not vague like "set up infrastructure")
Acceptance criteria Can you objectively verify completion?
File targets Expected file paths are listed?
Session-sized Can be completed in one sitting?

Output:

### Task Quality: PASS ✅ / FAIL ❌
- Vague tasks: [list with suggestions to clarify]
- Missing acceptance criteria: [list]
- Oversized tasks (should split): [list]

Pass 5: DAG Validation

Verify the dependency graph is valid for parallel execution:

Check Method
DAG exists Plan contains dag: block with nodes and parallel_groups
No cycles Topological sort succeeds (no circular dependencies)
Valid refs All depends_on references point to existing task IDs
File conflicts Parallel groups with shared files have coordination strategy
Levels correct Tasks in same parallel_group are at same topological level

Cycle Detection Algorithm:

def detect_cycles(dag):
    """Returns True if cycle exists, False otherwise."""
    visited = set()
    rec_stack = set()

    def dfs(node_id):
        visited.add(node_id)
        rec_stack.add(node_id)

        node = next((n for n in dag['nodes'] if n['id'] == node_id), None)
        for dep in node.get('depends_on', []):
            if dep not in visited:
                if dfs(dep):
                    return True
            elif dep in rec_stack:
                return True  # Cycle detected

        rec_stack.remove(node_id)
        return False

    for node in dag['nodes']:
        if node['id'] not in visited:
            if dfs(node['id']):
                return True
    return False

Output:

### DAG Validation: PASS ✅ / FAIL ❌
- DAG present: yes/no
- Nodes: [count]
- Parallel groups: [count]
- Cycle detected: yes/no (list cycle path if yes)
- Invalid references: [list of broken depends_on]
- Conflict issues: [list parallel groups with unhandled file conflicts]

Pass 6: Board of Directors Review (Major Tracks Only)

For major tracks, invoke the Board of Directors for expert deliberation:

When to invoke Board:

  • Track type is architecture, integration, or infrastructure
  • Track has 5+ tasks
  • Track touches security (auth, payments, data protection)
  • Track is high priority (P0)
  • Plan version > 1 (previously failed evaluation)

Board Invocation:

// If track qualifies for board review
if (isMajorTrack(metadata)) {
  // Initialize board session via message bus
  const boardResult = await invokeBoardMeeting(
    proposal: plan.md content,
    context: { spec, metadata, dag }
  );

  // Store board session in metadata
  metadata.loop_state.board_sessions.push({
    session_id: boardResult.session_id,
    checkpoint: "EVALUATE_PLAN",
    verdict: boardResult.verdict,
    vote_summary: boardResult.votes,
    conditions: boardResult.conditions,
    timestamp: new Date().toISOString()
  });

  // Board verdict affects overall evaluation
  if (boardResult.verdict === "REJECTED") {
    return FAIL with board conditions;
  }
}

Output:

### Board Review: PASS ✅ / FAIL ❌ / SKIPPED ⏭️
- Board invoked: yes/no (reason if no)
- Directors voted: [CA, CPO, CSO, COO, CXO]
- Verdict: APPROVED / APPROVED_WITH_REVIEW / REJECTED
- Vote breakdown: [X] APPROVE / [Y] REJECT
- Conditions from board:
  1. [Condition 1] (from [Director])
  2. [Condition 2] (from [Director])

Verdict

## Plan Evaluation Report

**Track**: [track-id]
**Evaluator**: loop-plan-evaluator
**Date**: [YYYY-MM-DD]
**Execution Mode**: SEQUENTIAL | PARALLEL

### Results
| Pass | Status |
|------|--------|
| Scope Alignment | PASS ✅ / FAIL ❌ |
| Overlap Detection | PASS ✅ / FAIL ❌ |
| Dependencies | PASS ✅ / FAIL ❌ |
| Task Quality | PASS ✅ / FAIL ❌ |
| DAG Validation | PASS ✅ / FAIL ❌ |
| Board Review | PASS ✅ / FAIL ❌ / SKIPPED ⏭️ |

### Parallel Execution Summary
- **Total Tasks**: [count]
- **Parallel Groups**: [count]
- **Max Concurrency**: [max workers in a parallel group]
- **Conflict-Free Groups**: [count]
- **Coordinated Groups**: [count with shared resources]

### Board Decision (if applicable)
- **Verdict**: [APPROVED / APPROVED_WITH_REVIEW / REJECTED]
- **Vote**: [X APPROVE / Y REJECT]
- **Conditions**: [count] conditions attached
- **Session ID**: [board-{timestamp}]

### Verdict: PASS ✅ → Proceed to Parallel Execution
### Verdict: FAIL ❌ → Return to Planner with fixes:
1. [Fix 1]
2. [Fix 2]

### Board Conditions (carry forward):
1. [Condition from board that must be verified in EVALUATE_EXECUTION]

Metadata Checkpoint Updates

The plan evaluator MUST update the track's metadata.json at key points:

On Start

{
  "loop_state": {
    "current_step": "EVALUATE_PLAN",
    "step_status": "IN_PROGRESS",
    "step_started_at": "[ISO timestamp]",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "IN_PROGRESS",
        "started_at": "[ISO timestamp]",
        "agent": "loop-plan-evaluator"
      }
    }
  }
}

On PASS

{
  "loop_state": {
    "current_step": "PARALLEL_EXECUTE",
    "step_status": "NOT_STARTED",
    "execution_mode": "PARALLEL",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "PASSED",
        "completed_at": "[ISO timestamp]",
        "verdict": "PASS",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": true,
          "dependencies": true,
          "task_quality": true,
          "dag_validation": true,
          "board_review": true
        },
        "cto_review": {
          "status": "PASSED",
          "reviewed_at": "[timestamp if run]"
        },
        "dag_summary": {
          "total_tasks": 8,
          "parallel_groups": 3,
          "max_concurrency": 4,
          "conflict_free_groups": 2,
          "coordinated_groups": 1
        }
      },
      "PARALLEL_EXECUTE": {
        "status": "NOT_STARTED"
      }
    },
    "board_sessions": [
      {
        "session_id": "board-20260201-123456",
        "checkpoint": "EVALUATE_PLAN",
        "verdict": "APPROVED",
        "vote_summary": {
          "CA": "APPROVE",
          "CPO": "APPROVE",
          "CSO": "APPROVE",
          "COO": "APPROVE",
          "CXO": "APPROVE"
        },
        "conditions": [
          "Add caching layer (CA)",
          "Security audit before launch (CSO)"
        ],
        "timestamp": "[ISO timestamp]"
      }
    ]
  }
}

On FAIL

{
  "loop_state": {
    "current_step": "PLAN",
    "step_status": "NOT_STARTED",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "FAILED",
        "completed_at": "[ISO timestamp]",
        "verdict": "FAIL",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": false,
          "dependencies": true,
          "task_quality": false
        },
        "failure_reasons": [
          "Overlap with existing track: component already built",
          "Task 3 is too vague"
        ]
      },
      "PLAN": {
        "status": "NOT_STARTED",
        "plan_version": 2
      }
    }
  }
}

Update Protocol

  1. read_file current metadata.json
  2. Update loop_state.checkpoints.EVALUATE_PLAN with verdict and checks
  3. If PASS: Advance current_step to EXECUTE
  4. If FAIL: Reset current_step to PLAN, increment plan_version
  5. write_file back to metadata.json

Handoff

  • PASS → Conductor dispatches loop-executor (Step 3)
  • FAIL → Conductor dispatches loop-planner to revise plan, then re-evaluates
Weekly Installs
4
GitHub Stars
275
First Seen
12 days ago
Installed on
cline4
gemini-cli4
github-copilot4
codex4
kimi-cli4
cursor4