quality-integration-test
Auto Mode
When --yes or -y: Auto-confirm test plan, skip interactive validation, use defaults for layer detection.
Maestro Integration Test (CSV Wave)
Usage
$quality-integration-test "3"
$quality-integration-test -c 4 "3 --max-iterations 8"
$quality-integration-test -y "3 --target-coverage 90"
$quality-integration-test --continue "integration-test-phase3-20260318"
Flags:
-y, --yes: Skip all confirmations (auto mode)-c, --concurrency N: Max concurrent agents within each wave (default: 4)--continue: Resume existing session
Output Directory: .workflow/.csv-wave/{session-id}/
Core Output: tasks.csv (master state) + results.csv (final) + discoveries.ndjson (shared exploration) + context.md (human-readable report) + summary.json (structured output for downstream)
Overview
Linear pipeline test execution using spawn_agents_on_csv. Progressive L0 → L1 → L2 → L3 layers where each layer depends on the previous passing. Self-iterating 6-phase cycle (Explore → Design → Develop → Test → Reflect → Adjust) with adaptive strategy engine.
Core workflow: Explore Codebase → Design Test Plan → Progressive Layer Execution → Reflect → Adjust Strategy → Iterate
┌─────────────────────────────────────────────────────────────────────────┐
│ INTEGRATION TEST CSV WAVE WORKFLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: Exploration → CSV │
│ ├─ Resolve phase directory from arguments │
│ ├─ Explore codebase for integration points │
│ ├─ Discover test infrastructure and existing tests │
│ ├─ Load pre-generated tests from quality-test-gen │
│ ├─ Design L0-L3 test plan │
│ ├─ Generate tasks.csv with rows per layer + module │
│ └─ User validates test plan (skip if -y) │
│ │
│ Phase 2: Wave Execution Engine (Linear Pipeline) │
│ ├─ Wave 1: L0 Static Analysis │
│ │ ├─ Type checking (tsc --noEmit) │
│ │ ├─ Linting (eslint / ruff) │
│ │ └─ Results: pass/fail per check │
│ ├─ Wave 2: L1 Unit Tests (parallel per module) │
│ │ ├─ Each module agent runs unit tests independently │
│ │ ├─ Discoveries shared (test commands, fixtures) │
│ │ └─ Results: tests_passed + tests_failed per module │
│ ├─ Wave 3: L2 Integration Tests │
│ │ ├─ Cross-module + API + DB tests │
│ │ ├─ Uses L1 context for test commands and patterns │
│ │ └─ Results: tests_passed + tests_failed + coverage │
│ ├─ Wave 4: L3 E2E Tests │
│ │ ├─ Full user flow tests │
│ │ ├─ Uses L2 context for integration points │
│ │ └─ Results: tests_passed + tests_failed + coverage │
│ └─ discoveries.ndjson shared across all waves (append-only) │
│ │
│ Phase 3: Reflect + Iterate │
│ ├─ Calculate overall pass rate │
│ ├─ Reflect on results (what worked, what failed, patterns) │
│ ├─ Adjust strategy (conservative/aggressive/surgical/reflective) │
│ ├─ If pass_rate < target: iterate (back to Phase 2) │
│ ├─ If pass_rate >= target OR max_iterations: finalize │
│ ├─ Export results.csv + summary.json │
│ ├─ Generate context.md + reflection-log.md │
│ └─ Display summary with next steps │
│ │
└─────────────────────────────────────────────────────────────────────────┘
CSV Schema
tasks.csv (Master State)
id,title,description,test_layer,test_scope,deps,context_from,wave,status,findings,tests_passed,tests_failed,coverage,error
"1","L0 Type Check","Run TypeScript type checking with tsc --noEmit. Report all type errors with file:line references.","L0-static","src/**/*.ts","","","1","","","","","",""
"2","L0 Lint","Run ESLint on all source files. Report errors and warnings with file:line references.","L0-static","src/**/*.ts","","","1","","","","","",""
"3","L1 Auth Module","Run unit tests for auth module: token verification, session management, password hashing. Isolated tests with mocked dependencies.","L1-unit","src/auth/**/*.ts","1;2","1;2","2","","","","","",""
"4","L1 API Module","Run unit tests for API module: route handlers, middleware, validators. Isolated tests with mocked DB.","L1-unit","src/api/**/*.ts","1;2","1;2","2","","","","","",""
"5","L1 Utils Module","Run unit tests for utility functions: validation, formatting, helpers. Pure function tests.","L1-unit","src/utils/**/*.ts","1;2","1;2","2","","","","","",""
"6","L2 API Integration","Run integration tests: API endpoints with real middleware chain, DB fixtures, cross-module data flow.","L2-integration","src/api/**/*.ts;src/auth/**/*.ts","3;4;5","3;4;5","3","","","","","",""
"7","L2 DB Integration","Run integration tests: database queries, migrations, transaction handling with test DB.","L2-integration","src/db/**/*.ts","3;4;5","3;4;5","3","","","","","",""
"8","L3 User Flows","Run E2E tests: login flow, CRUD operations, error handling. Full browser/process execution.","L3-e2e","src/**/*.ts","6;7","6;7","4","","","","","",""
Columns:
| Column | Phase | Description |
|---|---|---|
id |
Input | Unique task identifier (string) |
title |
Input | Short task title |
description |
Input | Detailed test execution instructions for this layer/scope |
test_layer |
Input | Test layer: L0-static/L1-unit/L2-integration/L3-e2e |
test_scope |
Input | Semicolon-separated file/module globs to test |
deps |
Input | Semicolon-separated dependency task IDs (previous layer tasks) |
context_from |
Input | Semicolon-separated task IDs whose findings this task needs |
wave |
Computed | Wave number: 1=L0, 2=L1, 3=L2, 4=L3 |
status |
Output | pending → completed / failed / skipped |
findings |
Output | Key findings summary: failures, patterns, coverage notes (max 500 chars) |
tests_passed |
Output | Count of passing tests |
tests_failed |
Output | Count of failing tests |
coverage |
Output | Coverage percentage for this scope (e.g., 87.5%) |
error |
Output | Error message if failed |
Per-Wave CSV (Temporary)
Each wave generates wave-{N}.csv with extra prev_context column populated from predecessor findings.
Output Artifacts
| File | Purpose | Lifecycle |
|---|---|---|
tasks.csv |
Master state — all tasks with status/findings | Updated after each wave |
wave-{N}.csv |
Per-wave input (temporary) | Created before wave, deleted after |
results.csv |
Final export of all task results | Created in Phase 3 |
discoveries.ndjson |
Shared exploration board | Append-only, carries across waves |
context.md |
Human-readable integration test report | Created in Phase 3 |
summary.json |
Structured output for downstream commands | Created in Phase 3 |
reflection-log.md |
Per-iteration reflection history | Append-only across iterations |
Session Structure
.workflow/.csv-wave/integration-test-{phase}-{date}/
├── tasks.csv
├── results.csv
├── discoveries.ndjson
├── context.md
├── summary.json
├── reflection-log.md
├── state.json
├── iteration-{N}/
│ ├── wave-{N}.csv (temporary)
│ └── test-results.json
└── wave-{N}.csv (temporary)
Implementation
Session Initialization
const getUtc8ISOString = () => new Date(Date.now() + 8 * 60 * 60 * 1000).toISOString()
// Parse flags
const AUTO_YES = $ARGUMENTS.includes('--yes') || $ARGUMENTS.includes('-y')
const continueMode = $ARGUMENTS.includes('--continue')
const concurrencyMatch = $ARGUMENTS.match(/(?:--concurrency|-c)\s+(\d+)/)
const maxConcurrency = concurrencyMatch ? parseInt(concurrencyMatch[1]) : 4
// Parse integration-test-specific flags
const maxIterMatch = $ARGUMENTS.match(/--max-iterations\s+(\d+)/)
const maxIterations = maxIterMatch ? parseInt(maxIterMatch[1]) : 5
const coverageMatch = $ARGUMENTS.match(/--target-coverage\s+(\d+)/)
const targetCoverage = coverageMatch ? parseInt(coverageMatch[1]) : 95
// Clean phase text
const phaseArg = $ARGUMENTS
.replace(/--yes|-y|--continue|--concurrency\s+\d+|-c\s+\d+|--max-iterations\s+\d+|--target-coverage\s+\d+/g, '')
.trim()
const dateStr = getUtc8ISOString().substring(0, 10).replace(/-/g, '')
const sessionId = `integration-test-phase${phaseArg}-${dateStr}`
const sessionFolder = `.workflow/.csv-wave/${sessionId}`
Bash(`mkdir -p ${sessionFolder}`)
// Initialize state.json
const state = {
phase: phaseArg,
started_at: getUtc8ISOString(),
current_iteration: 0,
max_iterations: maxIterations,
strategy: "conservative",
current_layer: "L0",
pass_rates: [],
convergence_threshold: targetCoverage,
status: "running"
}
Write(`${sessionFolder}/state.json`, JSON.stringify(state, null, 2))
// Initialize reflection-log.md
Write(`${sessionFolder}/reflection-log.md`,
`# Integration Test Reflection Log\nPhase: ${phaseArg}\nStarted: ${getUtc8ISOString()}\n\n## Iterations\n`)
Phase 1: Exploration → CSV
Objective: Explore codebase, discover integration points, design L0-L3 test plan, generate tasks.csv.
Decomposition Rules:
-
Phase resolution: Resolve
{phaseArg}to.workflow/phases/{NN}-{slug}/ -
Codebase exploration:
- Cross-module imports and dependencies
- API endpoints and route definitions
- Database interactions and queries
- External service integrations
- Event flows and message passing
-
Test infrastructure discovery:
- Detect frameworks (jest/vitest/pytest, playwright/cypress)
- Find existing integration and E2E tests
- Identify test utilities, fixtures, DB seed scripts
-
Pre-generated test loading: Check
{phase_dir}/.tests/test-gen-report.jsonfor tests fromquality-test-gen. Merge integration/e2e tests into plan (execute but don't re-generate). -
Layer design:
| Layer | Wave | Tasks | Content |
|---|---|---|---|
| L0 | 1 | 1-2 | Type check + lint commands |
| L1 | 2 | 1 per module | Unit tests per discovered module (parallel) |
| L2 | 3 | 1-3 | Integration tests (API, DB, cross-module) |
| L3 | 4 | 1-2 | E2E tests (user flows) |
-
Dependency wiring: L1 depends on L0, L2 depends on L1, L3 depends on L2.
-
CSV generation: Rows for all layers with correct wave assignments and deps.
User validation: Display layer breakdown with test counts (skip if AUTO_YES).
Phase 2: Wave Execution Engine
Objective: Execute test layers wave-by-wave via spawn_agents_on_csv. Progressive — each layer requires previous to pass.
Wave 1: L0 Static Analysis
- Read master
tasks.csv - Filter rows where
wave == 1ANDstatus == pending - No prev_context needed (first wave)
- Write
wave-1.csv - Execute:
spawn_agents_on_csv({
csv_path: `${sessionFolder}/wave-1.csv`,
id_column: "id",
instruction: buildL0Instruction(sessionFolder),
max_concurrency: maxConcurrency,
max_runtime_seconds: 300,
output_csv_path: `${sessionFolder}/wave-1-results.csv`,
output_schema: {
type: "object",
properties: {
id: { type: "string" },
status: { type: "string", enum: ["completed", "failed"] },
findings: { type: "string" },
tests_passed: { type: "string" },
tests_failed: { type: "string" },
coverage: { type: "string" },
error: { type: "string" }
},
required: ["id", "status", "findings"]
}
})
- Read
wave-1-results.csv, merge into mastertasks.csv - Delete
wave-1.csv - Gate check: If all L0 tasks failed, skip remaining waves for this iteration
Wave 2: L1 Unit Tests (Parallel per Module)
- Read master
tasks.csv - Filter rows where
wave == 2ANDstatus == pending - Check deps — all L0 tasks must be completed (not failed)
- Build
prev_contextfrom L0 findings:[Task 1: L0 Type Check] Clean — 0 type errors [Task 2: L0 Lint] 3 warnings in auth module (non-blocking) - Write
wave-2.csvwithprev_contextcolumn - Execute
spawn_agents_on_csvfor L1 agents (parallel per module) - Merge results into master
tasks.csv - Delete
wave-2.csv - Gate check: If all L1 tasks failed, skip L2 and L3
Wave 3: L2 Integration Tests
- Read master
tasks.csv - Filter rows where
wave == 3ANDstatus == pending - Build
prev_contextfrom L1 findings (test commands used, failures found, coverage gaps) - Write
wave-3.csvwithprev_context - Execute
spawn_agents_on_csvfor L2 agents - Merge results, delete temp CSV
- Gate check: If all L2 tasks failed, skip L3
Wave 4: L3 E2E Tests
- Read master
tasks.csv - Filter rows where
wave == 4ANDstatus == pending - Build
prev_contextfrom L2 findings (integration points tested, coverage levels) - Write
wave-4.csvwithprev_context - Execute
spawn_agents_on_csvfor L3 agents - Merge results, delete temp CSV
Phase 3: Reflect + Iterate
Objective: Evaluate results, reflect, adjust strategy, iterate or finalize.
Step 3a: Calculate Pass Rate
Aggregate across all layers:
overall_pass_rate = total_passed / (total_passed + total_failed) * 100
Record in state.json.pass_rates[].
Step 3b: Reflect
Analyze iteration results:
- Which tests failed and why?
- Is pass rate improving, plateauing, or regressing?
- Are failures clustered in one layer/module or spread out?
- Is the current strategy working?
Append to reflection-log.md:
## Iteration {N}
Strategy: {strategy_name}
Pass rate: {rate}% (previous: {prev_rate}%)
Delta: {+/-}%
### What worked
- {observation}
### What failed
- {test}: {reason}
### Pattern detected
- {pattern, e.g., "all failures in auth module"}
### Strategy assessment
- Current strategy: {effective|ineffective|partially_effective}
- Recommendation: {keep|switch_to_X}
Step 3c: Adjust Strategy (Adaptive Strategy Engine)
| Condition | Strategy | Behavior |
|---|---|---|
| Iteration 1-2 | Conservative | Fix obvious failures, don't refactor |
| Pass rate >80% AND failures similar to previous | Aggressive | Batch-fix related failures together |
| New regressions appeared | Surgical | Revert last changes, fix regression only |
| Stuck 3+ iterations (rate not improving) | Reflective | Step back, re-analyze root cause pattern |
Strategy transitions:
Conservative -> (pass rate >80%) -> Aggressive
Aggressive -> (regression) -> Surgical
Surgical -> (regression fixed) -> Aggressive
Any -> (stuck 3+ iters) -> Reflective
Reflective -> (new insight) -> Conservative (restart approach)
Update state.json with new strategy and iteration count.
Step 3d: Convergence Check
- If
overall_pass_rate >= target_coverage: CONVERGED → finalize - If
iteration >= max_iterations: MAX_ITER_REACHED → finalize - Otherwise: ITERATE → reset pending tasks for failing layers, go back to Phase 2
Step 3e: Finalize
- Read final master
tasks.csv - Export as
results.csv - Build
summary.json:
{
"phase": "<phase>",
"completed_at": "<ISO>",
"session_id": "<session-id>",
"iterations": 3,
"final_pass_rate": 97.5,
"converged": true,
"convergence_threshold": 95,
"strategy_history": ["conservative", "conservative", "aggressive"],
"layers": {
"L0": { "status": "pass" },
"L1": { "total": 15, "passed": 15, "failed": 0, "pass_rate": 100.0 },
"L2": { "total": 8, "passed": 7, "failed": 1, "pass_rate": 87.5 },
"L3": { "total": 4, "passed": 4, "failed": 0, "pass_rate": 100.0 }
},
"bugs_discovered": [],
"regressions_fixed": []
}
- Generate
context.md:
# Integration Test Report — Phase {phase}
## Summary
- Iterations: {N}/{max_iter}
- Converged: {yes/no} (threshold: {threshold}%)
- Final pass rate: {rate}%
- Strategy: {final_strategy} (transitioned {N} times)
## Layer Results
| Layer | Status | Passed | Failed | Pass Rate | Coverage |
|-------|--------|--------|--------|-----------|----------|
| L0 Static | {pass/fail} | — | — | — | — |
| L1 Unit | {status} | {P} | {F} | {rate}% | {cov}% |
| L2 Integration | {status} | {P} | {F} | {rate}% | {cov}% |
| L3 E2E | {status} | {P} | {F} | {rate}% | {cov}% |
## Iteration History
| Iter | Strategy | Pass Rate | Delta | Action |
|------|----------|-----------|-------|--------|
| 1 | conservative | 72.0% | — | fixed 3 type errors |
| 2 | conservative | 85.5% | +13.5% | fixed auth test fixtures |
| 3 | aggressive | 97.5% | +12.0% | batch-fixed API tests |
## Reflection Summary
{key insights from reflection-log.md}
## Bugs Discovered
{list of bugs found during testing}
## Next Steps
{suggested_next_command}
-
Copy
summary.jsonto phase.tests/integration/directory. -
Update
index.jsonwith integration test status. -
Display summary.
Next step routing:
| Result | Suggestion |
|---|---|
| Converged (>=target%) | maestro-verify {phase} to update validation |
| Max iter, >80% | quality-test {phase} for manual UAT on remaining gaps |
| Max iter, <80% | quality-debug for deep investigation |
| Bugs discovered | maestro-plan {phase} --gaps to plan fixes |
Shared Discovery Board Protocol
Standard Discovery Types
| Type | Dedup Key | Data Schema | Description |
|---|---|---|---|
code_pattern |
data.name |
{name, file, description} |
Reusable code pattern found |
integration_point |
data.file |
{file, description, exports[]} |
Module connection point |
convention |
singleton | {naming, imports, formatting} |
Project code conventions |
blocker |
data.issue |
{issue, severity, impact} |
Blocking issue found |
tech_stack |
singleton | {framework, language, tools[]} |
Technology stack info |
Domain Discovery Types
| Type | Dedup Key | Data Schema | Description |
|---|---|---|---|
test_command |
data.layer |
{layer, command, flags, cwd} |
Working test command for a layer |
test_fixture |
data.name |
{name, file, setup, teardown} |
Shared test fixture or DB seed |
coverage_gap |
data.module |
{module, layer, uncovered_areas[]} |
Coverage gap in a module |
regression |
data.test |
{test, file, previous_status, current_status} |
Test that regressed |
flaky_test |
data.test |
{test, file, fail_rate, pattern} |
Intermittently failing test |
Protocol
- Read
{session_folder}/discoveries.ndjsonbefore own test execution - Skip covered: If discovery of same type + dedup key exists, skip
- Write immediately: Append findings as found
- Append-only: Never modify or delete
- Deduplicate: Check before writing
echo '{"ts":"<ISO>","worker":"{id}","type":"test_command","data":{"layer":"L1","command":"npx vitest run --reporter=verbose","flags":"--testPathPattern=unit","cwd":"."}}' >> {session_folder}/discoveries.ndjson
Error Handling
| Error | Resolution |
|---|---|
| Phase directory not found | Abort with error: "Phase {N} not found" |
| No test framework detected | Abort with error: "No test framework detected (E003)" |
| L0 static analysis fails | Record failures, proceed to L1 (type errors are informational) |
| All tasks in a layer failed | Gate check: skip subsequent layers for this iteration |
| Agent timeout | Mark as failed, continue with remaining agents in wave |
| Max iterations without convergence | Finalize with current results, warn (W001) |
| Regression detected | Switch to Surgical strategy (W002) |
| Stuck 3+ iterations | Switch to Reflective strategy (W003) |
| CSV parse error | Validate format, show line number |
| discoveries.ndjson corrupt | Ignore malformed lines |
| Continue mode: no session found | List available sessions |
| state.json missing on resume | Rebuild from tasks.csv status column |
Core Rules
- Start Immediately: First action is session initialization, then Phase 1
- Wave Order is Sacred: Never execute wave N+1 before wave N completes and results are merged
- Progressive Layers: L0 → L1 → L2 → L3 — each layer gates the next
- CSV is Source of Truth: Master tasks.csv holds all state
- Context Propagation: prev_context built from master CSV, not from memory
- Discovery Board is Append-Only: Never clear, modify, or recreate discoveries.ndjson
- Self-Iterating: Loop until convergence or max iterations — do not stop after one pass
- Strategy is Adaptive: Apply the strategy engine rules for transitions, never stay on a failing strategy
- Reflect Before Adjusting: Always log reflection before changing strategy
- Cleanup Temp Files: Remove wave-{N}.csv after results are merged
- DO NOT STOP: Continuous execution until convergence or max iterations reached
More from catlog22/maestro-flow
spec-map
Analyze codebase with 4 parallel mapper agents via CSV wave pipeline. Produces .workflow/codebase/ documents for tech-stack, architecture, features, and cross-cutting concerns.
1manage-codebase-rebuild
Full codebase documentation rebuild via CSV wave pipeline. Spawns 5 parallel doc generator agents to scan project and produce complete .workflow/codebase/ documentation set. Replaces manage-codebase-rebuild command.
1maestro-quick
Fast-track single task execution with workflow guarantees — analyze, plan, execute in one pass
1quality-sync
Sync codebase docs after code changes -- traces git diff through component/feature/requirement layers
1maestro-roadmap
Lightweight roadmap generation via 2-wave CSV pipeline. Wave 1 runs parallel requirement analysis agents (scope, risk, dependency). Wave 2 runs roadmap assembly agent producing roadmap.md with phases, milestones, and success criteria. Replaces maestro-roadmap command.
1manage-memory
Manage memory entries across workflow and system stores (list, search, view, edit, delete, prune)
1