Auto Mode

When --yes or -y: Auto-confirm test plan, skip interactive validation, use defaults for layer detection.

Maestro Integration Test (CSV Wave)

Usage

$quality-integration-test "3"
$quality-integration-test -c 4 "3 --max-iterations 8"
$quality-integration-test -y "3 --target-coverage 90"
$quality-integration-test --continue "integration-test-phase3-20260318"

Flags:

-y, --yes: Skip all confirmations (auto mode)
-c, --concurrency N: Max concurrent agents within each wave (default: 4)
--continue: Resume existing session

Output Directory: .workflow/.csv-wave/{session-id}/ Core Output: tasks.csv (master state) + results.csv (final) + discoveries.ndjson (shared exploration) + context.md (human-readable report) + summary.json (structured output for downstream)

Overview

Linear pipeline test execution using spawn_agents_on_csv. Progressive L0 → L1 → L2 → L3 layers where each layer depends on the previous passing. Self-iterating 6-phase cycle (Explore → Design → Develop → Test → Reflect → Adjust) with adaptive strategy engine.

Core workflow: Explore Codebase → Design Test Plan → Progressive Layer Execution → Reflect → Adjust Strategy → Iterate

┌─────────────────────────────────────────────────────────────────────────┐
│              INTEGRATION TEST CSV WAVE WORKFLOW                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Phase 1: Exploration → CSV                                              │
│     ├─ Resolve phase directory from arguments                            │
│     ├─ Explore codebase for integration points                           │
│     ├─ Discover test infrastructure and existing tests                   │
│     ├─ Load pre-generated tests from quality-test-gen                    │
│     ├─ Design L0-L3 test plan                                            │
│     ├─ Generate tasks.csv with rows per layer + module                   │
│     └─ User validates test plan (skip if -y)                             │
│                                                                          │
│  Phase 2: Wave Execution Engine (Linear Pipeline)                        │
│     ├─ Wave 1: L0 Static Analysis                                        │
│     │   ├─ Type checking (tsc --noEmit)                                  │
│     │   ├─ Linting (eslint / ruff)                                       │
│     │   └─ Results: pass/fail per check                                  │
│     ├─ Wave 2: L1 Unit Tests (parallel per module)                       │
│     │   ├─ Each module agent runs unit tests independently               │
│     │   ├─ Discoveries shared (test commands, fixtures)                  │
│     │   └─ Results: tests_passed + tests_failed per module               │
│     ├─ Wave 3: L2 Integration Tests                                      │
│     │   ├─ Cross-module + API + DB tests                                 │
│     │   ├─ Uses L1 context for test commands and patterns                │
│     │   └─ Results: tests_passed + tests_failed + coverage               │
│     ├─ Wave 4: L3 E2E Tests                                              │
│     │   ├─ Full user flow tests                                          │
│     │   ├─ Uses L2 context for integration points                        │
│     │   └─ Results: tests_passed + tests_failed + coverage               │
│     └─ discoveries.ndjson shared across all waves (append-only)          │
│                                                                          │
│  Phase 3: Reflect + Iterate                                              │
│     ├─ Calculate overall pass rate                                       │
│     ├─ Reflect on results (what worked, what failed, patterns)           │
│     ├─ Adjust strategy (conservative/aggressive/surgical/reflective)     │
│     ├─ If pass_rate < target: iterate (back to Phase 2)                  │
│     ├─ If pass_rate >= target OR max_iterations: finalize                │
│     ├─ Export results.csv + summary.json                                 │
│     ├─ Generate context.md + reflection-log.md                           │
│     └─ Display summary with next steps                                   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

CSV Schema

tasks.csv (Master State)

id,title,description,test_layer,test_scope,deps,context_from,wave,status,findings,tests_passed,tests_failed,coverage,error
"1","L0 Type Check","Run TypeScript type checking with tsc --noEmit. Report all type errors with file:line references.","L0-static","src/**/*.ts","","","1","","","","","",""
"2","L0 Lint","Run ESLint on all source files. Report errors and warnings with file:line references.","L0-static","src/**/*.ts","","","1","","","","","",""
"3","L1 Auth Module","Run unit tests for auth module: token verification, session management, password hashing. Isolated tests with mocked dependencies.","L1-unit","src/auth/**/*.ts","1;2","1;2","2","","","","","",""
"4","L1 API Module","Run unit tests for API module: route handlers, middleware, validators. Isolated tests with mocked DB.","L1-unit","src/api/**/*.ts","1;2","1;2","2","","","","","",""
"5","L1 Utils Module","Run unit tests for utility functions: validation, formatting, helpers. Pure function tests.","L1-unit","src/utils/**/*.ts","1;2","1;2","2","","","","","",""
"6","L2 API Integration","Run integration tests: API endpoints with real middleware chain, DB fixtures, cross-module data flow.","L2-integration","src/api/**/*.ts;src/auth/**/*.ts","3;4;5","3;4;5","3","","","","","",""
"7","L2 DB Integration","Run integration tests: database queries, migrations, transaction handling with test DB.","L2-integration","src/db/**/*.ts","3;4;5","3;4;5","3","","","","","",""
"8","L3 User Flows","Run E2E tests: login flow, CRUD operations, error handling. Full browser/process execution.","L3-e2e","src/**/*.ts","6;7","6;7","4","","","","","",""

Columns:

Column	Phase	Description
`id`	Input	Unique task identifier (string)
`title`	Input	Short task title
`description`	Input	Detailed test execution instructions for this layer/scope
`test_layer`	Input	Test layer: L0-static/L1-unit/L2-integration/L3-e2e
`test_scope`	Input	Semicolon-separated file/module globs to test
`deps`	Input	Semicolon-separated dependency task IDs (previous layer tasks)
`context_from`	Input	Semicolon-separated task IDs whose findings this task needs
`wave`	Computed	Wave number: 1=L0, 2=L1, 3=L2, 4=L3
`status`	Output	`pending` → `completed` / `failed` / `skipped`
`findings`	Output	Key findings summary: failures, patterns, coverage notes (max 500 chars)
`tests_passed`	Output	Count of passing tests
`tests_failed`	Output	Count of failing tests
`coverage`	Output	Coverage percentage for this scope (e.g., `87.5%`)
`error`	Output	Error message if failed

Per-Wave CSV (Temporary)

Each wave generates wave-{N}.csv with extra prev_context column populated from predecessor findings.

Output Artifacts

File	Purpose	Lifecycle
`tasks.csv`	Master state — all tasks with status/findings	Updated after each wave
`wave-{N}.csv`	Per-wave input (temporary)	Created before wave, deleted after
`results.csv`	Final export of all task results	Created in Phase 3
`discoveries.ndjson`	Shared exploration board	Append-only, carries across waves
`context.md`	Human-readable integration test report	Created in Phase 3
`summary.json`	Structured output for downstream commands	Created in Phase 3
`reflection-log.md`	Per-iteration reflection history	Append-only across iterations

Session Structure

.workflow/.csv-wave/integration-test-{phase}-{date}/
├── tasks.csv
├── results.csv
├── discoveries.ndjson
├── context.md
├── summary.json
├── reflection-log.md
├── state.json
├── iteration-{N}/
│   ├── wave-{N}.csv (temporary)
│   └── test-results.json
└── wave-{N}.csv (temporary)

Implementation

Session Initialization

const getUtc8ISOString = () => new Date(Date.now() + 8 * 60 * 60 * 1000).toISOString()

// Parse flags
const AUTO_YES = $ARGUMENTS.includes('--yes') || $ARGUMENTS.includes('-y')
const continueMode = $ARGUMENTS.includes('--continue')
const concurrencyMatch = $ARGUMENTS.match(/(?:--concurrency|-c)\s+(\d+)/)
const maxConcurrency = concurrencyMatch ? parseInt(concurrencyMatch[1]) : 4

// Parse integration-test-specific flags
const maxIterMatch = $ARGUMENTS.match(/--max-iterations\s+(\d+)/)
const maxIterations = maxIterMatch ? parseInt(maxIterMatch[1]) : 5
const coverageMatch = $ARGUMENTS.match(/--target-coverage\s+(\d+)/)
const targetCoverage = coverageMatch ? parseInt(coverageMatch[1]) : 95

// Clean phase text
const phaseArg = $ARGUMENTS
  .replace(/--yes|-y|--continue|--concurrency\s+\d+|-c\s+\d+|--max-iterations\s+\d+|--target-coverage\s+\d+/g, '')
  .trim()

const dateStr = getUtc8ISOString().substring(0, 10).replace(/-/g, '')
const sessionId = `integration-test-phase${phaseArg}-${dateStr}`
const sessionFolder = `.workflow/.csv-wave/${sessionId}`

Bash(`mkdir -p ${sessionFolder}`)

// Initialize state.json
const state = {
  phase: phaseArg,
  started_at: getUtc8ISOString(),
  current_iteration: 0,
  max_iterations: maxIterations,
  strategy: "conservative",
  current_layer: "L0",
  pass_rates: [],
  convergence_threshold: targetCoverage,
  status: "running"
}
Write(`${sessionFolder}/state.json`, JSON.stringify(state, null, 2))

// Initialize reflection-log.md
Write(`${sessionFolder}/reflection-log.md`,
  `# Integration Test Reflection Log\nPhase: ${phaseArg}\nStarted: ${getUtc8ISOString()}\n\n## Iterations\n`)

Phase 1: Exploration → CSV

Objective: Explore codebase, discover integration points, design L0-L3 test plan, generate tasks.csv.

Decomposition Rules:

Phase resolution: Resolve {phaseArg} to .workflow/phases/{NN}-{slug}/
Codebase exploration:
- Cross-module imports and dependencies
- API endpoints and route definitions
- Database interactions and queries
- External service integrations
- Event flows and message passing
Test infrastructure discovery:
- Detect frameworks (jest/vitest/pytest, playwright/cypress)
- Find existing integration and E2E tests
- Identify test utilities, fixtures, DB seed scripts
Pre-generated test loading: Check {phase_dir}/.tests/test-gen-report.json for tests from quality-test-gen. Merge integration/e2e tests into plan (execute but don't re-generate).
Layer design:

Layer	Wave	Tasks	Content
L0	1	1-2	Type check + lint commands
L1	2	1 per module	Unit tests per discovered module (parallel)
L2	3	1-3	Integration tests (API, DB, cross-module)
L3	4	1-2	E2E tests (user flows)

Dependency wiring: L1 depends on L0, L2 depends on L1, L3 depends on L2.
CSV generation: Rows for all layers with correct wave assignments and deps.

User validation: Display layer breakdown with test counts (skip if AUTO_YES).

Phase 2: Wave Execution Engine

Objective: Execute test layers wave-by-wave via spawn_agents_on_csv. Progressive — each layer requires previous to pass.

Wave 1: L0 Static Analysis

Read master tasks.csv
Filter rows where wave == 1 AND status == pending
No prev_context needed (first wave)
Write wave-1.csv
Execute:

spawn_agents_on_csv({
  csv_path: `${sessionFolder}/wave-1.csv`,
  id_column: "id",
  instruction: buildL0Instruction(sessionFolder),
  max_concurrency: maxConcurrency,
  max_runtime_seconds: 300,
  output_csv_path: `${sessionFolder}/wave-1-results.csv`,
  output_schema: {
    type: "object",
    properties: {
      id: { type: "string" },
      status: { type: "string", enum: ["completed", "failed"] },
      findings: { type: "string" },
      tests_passed: { type: "string" },
      tests_failed: { type: "string" },
      coverage: { type: "string" },
      error: { type: "string" }
    },
    required: ["id", "status", "findings"]
  }
})

Read wave-1-results.csv, merge into master tasks.csv
Delete wave-1.csv
Gate check: If all L0 tasks failed, skip remaining waves for this iteration

Wave 2: L1 Unit Tests (Parallel per Module)

Read master tasks.csv
Filter rows where wave == 2 AND status == pending
Check deps — all L0 tasks must be completed (not failed)

Build prev_context from L0 findings:

[Task 1: L0 Type Check] Clean — 0 type errors
[Task 2: L0 Lint] 3 warnings in auth module (non-blocking)

Write wave-2.csv with prev_context column
Execute spawn_agents_on_csv for L1 agents (parallel per module)
Merge results into master tasks.csv
Delete wave-2.csv
Gate check: If all L1 tasks failed, skip L2 and L3

Wave 3: L2 Integration Tests

Read master tasks.csv
Filter rows where wave == 3 AND status == pending
Build prev_context from L1 findings (test commands used, failures found, coverage gaps)
Write wave-3.csv with prev_context
Execute spawn_agents_on_csv for L2 agents
Merge results, delete temp CSV
Gate check: If all L2 tasks failed, skip L3

Wave 4: L3 E2E Tests

Read master tasks.csv
Filter rows where wave == 4 AND status == pending
Build prev_context from L2 findings (integration points tested, coverage levels)
Write wave-4.csv with prev_context
Execute spawn_agents_on_csv for L3 agents
Merge results, delete temp CSV

Phase 3: Reflect + Iterate

Objective: Evaluate results, reflect, adjust strategy, iterate or finalize.

Step 3a: Calculate Pass Rate

Aggregate across all layers:

overall_pass_rate = total_passed / (total_passed + total_failed) * 100

Record in state.json.pass_rates[].

Step 3b: Reflect

Analyze iteration results:

Which tests failed and why?
Is pass rate improving, plateauing, or regressing?
Are failures clustered in one layer/module or spread out?
Is the current strategy working?

Append to reflection-log.md:

## Iteration {N}
Strategy: {strategy_name}
Pass rate: {rate}% (previous: {prev_rate}%)
Delta: {+/-}%

### What worked
- {observation}

### What failed
- {test}: {reason}

### Pattern detected
- {pattern, e.g., "all failures in auth module"}

### Strategy assessment
- Current strategy: {effective|ineffective|partially_effective}
- Recommendation: {keep|switch_to_X}

Step 3c: Adjust Strategy (Adaptive Strategy Engine)

Condition	Strategy	Behavior
Iteration 1-2	Conservative	Fix obvious failures, don't refactor
Pass rate >80% AND failures similar to previous	Aggressive	Batch-fix related failures together
New regressions appeared	Surgical	Revert last changes, fix regression only
Stuck 3+ iterations (rate not improving)	Reflective	Step back, re-analyze root cause pattern

Strategy transitions:

Conservative -> (pass rate >80%) -> Aggressive
Aggressive -> (regression) -> Surgical
Surgical -> (regression fixed) -> Aggressive
Any -> (stuck 3+ iters) -> Reflective
Reflective -> (new insight) -> Conservative (restart approach)

Update state.json with new strategy and iteration count.

Step 3d: Convergence Check

If overall_pass_rate >= target_coverage: CONVERGED → finalize
If iteration >= max_iterations: MAX_ITER_REACHED → finalize
Otherwise: ITERATE → reset pending tasks for failing layers, go back to Phase 2

Step 3e: Finalize

Read final master tasks.csv
Export as results.csv
Build summary.json:

{
  "phase": "<phase>",
  "completed_at": "<ISO>",
  "session_id": "<session-id>",
  "iterations": 3,
  "final_pass_rate": 97.5,
  "converged": true,
  "convergence_threshold": 95,
  "strategy_history": ["conservative", "conservative", "aggressive"],
  "layers": {
    "L0": { "status": "pass" },
    "L1": { "total": 15, "passed": 15, "failed": 0, "pass_rate": 100.0 },
    "L2": { "total": 8, "passed": 7, "failed": 1, "pass_rate": 87.5 },
    "L3": { "total": 4, "passed": 4, "failed": 0, "pass_rate": 100.0 }
  },
  "bugs_discovered": [],
  "regressions_fixed": []
}

Generate context.md:

# Integration Test Report — Phase {phase}

## Summary
- Iterations: {N}/{max_iter}
- Converged: {yes/no} (threshold: {threshold}%)
- Final pass rate: {rate}%
- Strategy: {final_strategy} (transitioned {N} times)

## Layer Results
| Layer | Status | Passed | Failed | Pass Rate | Coverage |
|-------|--------|--------|--------|-----------|----------|
| L0 Static | {pass/fail} | — | — | — | — |
| L1 Unit | {status} | {P} | {F} | {rate}% | {cov}% |
| L2 Integration | {status} | {P} | {F} | {rate}% | {cov}% |
| L3 E2E | {status} | {P} | {F} | {rate}% | {cov}% |

## Iteration History
| Iter | Strategy | Pass Rate | Delta | Action |
|------|----------|-----------|-------|--------|
| 1 | conservative | 72.0% | — | fixed 3 type errors |
| 2 | conservative | 85.5% | +13.5% | fixed auth test fixtures |
| 3 | aggressive | 97.5% | +12.0% | batch-fixed API tests |

## Reflection Summary
{key insights from reflection-log.md}

## Bugs Discovered
{list of bugs found during testing}

## Next Steps
{suggested_next_command}

Copy summary.json to phase .tests/integration/ directory.
Update index.json with integration test status.
Display summary.

Next step routing:

Result	Suggestion
Converged (>=target%)	`maestro-verify {phase}` to update validation
Max iter, >80%	`quality-test {phase}` for manual UAT on remaining gaps
Max iter, <80%	`quality-debug` for deep investigation
Bugs discovered	`maestro-plan {phase} --gaps` to plan fixes

Shared Discovery Board Protocol

Standard Discovery Types

Type	Dedup Key	Data Schema	Description
`code_pattern`	`data.name`	`{name, file, description}`	Reusable code pattern found
`integration_point`	`data.file`	`{file, description, exports[]}`	Module connection point
`convention`	singleton	`{naming, imports, formatting}`	Project code conventions
`blocker`	`data.issue`	`{issue, severity, impact}`	Blocking issue found
`tech_stack`	singleton	`{framework, language, tools[]}`	Technology stack info

Domain Discovery Types

Type	Dedup Key	Data Schema	Description
`test_command`	`data.layer`	`{layer, command, flags, cwd}`	Working test command for a layer
`test_fixture`	`data.name`	`{name, file, setup, teardown}`	Shared test fixture or DB seed
`coverage_gap`	`data.module`	`{module, layer, uncovered_areas[]}`	Coverage gap in a module
`regression`	`data.test`	`{test, file, previous_status, current_status}`	Test that regressed
`flaky_test`	`data.test`	`{test, file, fail_rate, pattern}`	Intermittently failing test

Protocol

Read {session_folder}/discoveries.ndjson before own test execution
Skip covered: If discovery of same type + dedup key exists, skip
Write immediately: Append findings as found
Append-only: Never modify or delete
Deduplicate: Check before writing

echo '{"ts":"<ISO>","worker":"{id}","type":"test_command","data":{"layer":"L1","command":"npx vitest run --reporter=verbose","flags":"--testPathPattern=unit","cwd":"."}}' >> {session_folder}/discoveries.ndjson

Error Handling

Error	Resolution
Phase directory not found	Abort with error: "Phase {N} not found"
No test framework detected	Abort with error: "No test framework detected (E003)"
L0 static analysis fails	Record failures, proceed to L1 (type errors are informational)
All tasks in a layer failed	Gate check: skip subsequent layers for this iteration
Agent timeout	Mark as failed, continue with remaining agents in wave
Max iterations without convergence	Finalize with current results, warn (W001)
Regression detected	Switch to Surgical strategy (W002)
Stuck 3+ iterations	Switch to Reflective strategy (W003)
CSV parse error	Validate format, show line number
discoveries.ndjson corrupt	Ignore malformed lines
Continue mode: no session found	List available sessions
state.json missing on resume	Rebuild from tasks.csv status column

Core Rules

Start Immediately: First action is session initialization, then Phase 1
Wave Order is Sacred: Never execute wave N+1 before wave N completes and results are merged
Progressive Layers: L0 → L1 → L2 → L3 — each layer gates the next
CSV is Source of Truth: Master tasks.csv holds all state
Context Propagation: prev_context built from master CSV, not from memory
Discovery Board is Append-Only: Never clear, modify, or recreate discoveries.ndjson
Self-Iterating: Loop until convergence or max iterations — do not stop after one pass
Strategy is Adaptive: Apply the strategy engine rules for transitions, never stay on a failing strategy
Reflect Before Adjusting: Always log reflection before changing strategy
Cleanup Temp Files: Remove wave-{N}.csv after results are merged
DO NOT STOP: Continuous execution until convergence or max iterations reached

quality-integration-test