workflow-lite-test-review
Workflow-Lite-Test-Review
Test review and fix engine for workflow-lite-execute chain or standalone invocation.
Project Context: Run ccw spec load --category test for test framework conventions, coverage targets, and fixtures.
Usage
<session-path|--last> Session path or auto-detect last session (required for standalone)
| Flag | Description |
|---|---|
--in-memory |
Mode 1: Chain from workflow-lite-execute via testReviewContext global variable |
--skip-fix |
Review only, do not auto-fix failures |
Input Modes
Mode 1: In-Memory Chain (from workflow-lite-execute)
Trigger: --in-memory flag or testReviewContext global variable available
Input Source: testReviewContext global variable set by workflow-lite-execute Step 4
Behavior: Skip session discovery, inherit convergenceReviewTool from execution chain, proceed directly to TR-Phase 1.
Note: workflow-lite-execute Step 5 is the chain gate. Mode 1 invocation means execution + code review are complete — proceed with convergence verification + tests.
Mode 2: Standalone
Trigger: User calls with session path or --last
Behavior: Discover session → load plan + tasks → convergenceReviewTool = 'agent' → proceed to TR-Phase 1.
let sessionPath, plan, taskFiles, convergenceReviewTool
if (testReviewContext) {
// Mode 1: from workflow-lite-execute chain
sessionPath = testReviewContext.session.folder
plan = testReviewContext.planObject
taskFiles = testReviewContext.taskFiles.map(tf => JSON.parse(Read(tf.path)))
convergenceReviewTool = testReviewContext.convergenceReviewTool || 'agent'
} else {
// Mode 2: standalone — find last session or use provided path
sessionPath = resolveSessionPath($ARGUMENTS) // Glob('.workflow/.lite-plan/*/plan.json'), take last
plan = JSON.parse(Read(`${sessionPath}/plan.json`))
taskFiles = plan.task_ids.map(id => JSON.parse(Read(`${sessionPath}/.task/${id}.json`)))
convergenceReviewTool = 'agent'
}
const skipFix = $ARGUMENTS?.includes('--skip-fix') || false
Phase Summary
| Phase | Core Action | Output |
|---|---|---|
| TR-Phase 1 | Detect test framework + gather changes | testConfig, changedFiles |
| TR-Phase 2 | Convergence verification against plan criteria | reviewResults[] |
| TR-Phase 3 | Run tests + generate checklist | test-checklist.json |
| TR-Phase 4 | Auto-fix failures (iterative, max 3 rounds) | Fixed code + updated checklist |
| TR-Phase 5 | Output report + chain to session:sync | test-review.md |
TR-Phase 0: Initialize
Set sessionId from sessionPath. Create TodoWrite with 5 phases (Phase 1 = in_progress, rest = pending).
TR-Phase 1: Detect Test Framework & Gather Changes
Test framework detection (check in order, first match wins):
| File | Framework | Command |
|---|---|---|
package.json with scripts.test |
jest/vitest | npm test |
package.json with scripts['test:unit'] |
jest/vitest | npm run test:unit |
pyproject.toml |
pytest | python -m pytest -v --tb=short |
Cargo.toml |
cargo-test | cargo test |
go.mod |
go-test | go test ./... |
Gather git changes: git diff --name-only HEAD~5..HEAD → changedFiles[]
Output: testConfig = { command, framework, type } + changedFiles[]
// TodoWrite: Phase 1 → completed, Phase 2 → in_progress
TR-Phase 2: Convergence Verification
Skip if: convergenceReviewTool === 'skip' — set all tasks to PASS, proceed to Phase 3.
Verify each task's convergence criteria are met in the implementation and identify test gaps.
Agent Convergence Review (convergenceReviewTool === 'agent', default):
For each task in taskFiles:
- Extract
convergence.criteria[]from the task - Match
task.files[].pathagainstchangedFilesto find actually-changed files - Read each matched file, verify each convergence criterion with file:line evidence
- Check test coverage gaps:
- If
task.test.unitdefined but no matching test files in changedFiles → mark as test gap - If
task.test.integrationdefined but no integration test in changedFiles → mark as test gap
- If
- Build
reviewResult = { taskId, title, criteria_met[], criteria_unmet[], test_gaps[], files_reviewed[] }
Verdict logic:
- PASS = all
convergence.criteriamet + no test gaps - PARTIAL = some criteria met OR has test gaps
- FAIL = no criteria met
CLI Convergence Review (convergenceReviewTool === 'gemini' or 'codex'):
const reviewId = `${sessionId}-convergence`
const taskCriteria = taskFiles.map(t => `${t.id}: [${(t.convergence?.criteria || []).join(' | ')}]`).join('\n')
Bash(`ccw cli -p "PURPOSE: Convergence verification — check each task's completion criteria against actual implementation
TASK: • For each task below, verify every convergence criterion is satisfied in the changed files • Mark each criterion as MET (with file:line evidence) or UNMET (with what's missing) • Identify test coverage gaps (planned tests not found in changes)
TASK CRITERIA:
${taskCriteria}
CHANGED FILES: ${changedFiles.join(', ')}
MODE: analysis
CONTEXT: @${sessionPath}/plan.json @${sessionPath}/.task/*.json @**/* | Memory: workflow-lite-execute completed
EXPECTED: Per-task verdict (PASS/PARTIAL/FAIL) with per-criterion evidence + test gap list
CONSTRAINTS: Read-only | Focus strictly on convergence criteria verification, NOT code quality (code review already done in workflow-lite-execute)" --tool ${convergenceReviewTool} --mode analysis --id ${reviewId}`, { run_in_background: true })
// STOP - wait for hook callback, then parse CLI output into reviewResults format
// TodoWrite: Phase 2 → completed, Phase 3 → in_progress
TR-Phase 3: Run Tests & Generate Checklist
Build checklist from reviewResults:
- Per task: status =
PASS(all criteria met) /PARTIAL(some met) /FAIL(none met) - Collect test_items from
task.test.unit[],task.test.integration[],task.test.success_metrics[]+ review test_gaps
Run tests if testConfig.command exists:
- Execute with 5min timeout
- Parse output: detect passed/failed patterns →
overall: 'PASS' | 'FAIL' | 'UNKNOWN'
Write ${sessionPath}/test-checklist.json
// TodoWrite: Phase 3 → completed, Phase 4 → in_progress
TR-Phase 4: Auto-Fix Failures (Iterative)
Skip if: skipFix === true OR testChecklist.execution?.overall !== 'FAIL'
Max iterations: 3. Each iteration:
- Delegate to test-fix-agent:
Agent({
subagent_type: "test-fix-agent",
run_in_background: false,
description: `Fix tests (iter ${iteration})`,
prompt: `## Test Fix Iteration ${iteration}/${MAX_ITERATIONS}
**Test Command**: ${testConfig.command}
**Framework**: ${testConfig.framework}
**Session**: ${sessionPath}
### Failing Output (last 3000 chars)
\`\`\`
${testChecklist.execution.raw_output}
\`\`\`
### Plan Context
**Summary**: ${plan.summary}
**Tasks**: ${taskFiles.map(t => `${t.id}: ${t.title}`).join(' | ')}
### Instructions
1. Analyze test failure output to identify root cause
2. Fix the SOURCE CODE (not tests) unless tests themselves are wrong
3. Run \`${testConfig.command}\` to verify fix
4. If fix introduces new failures, revert and try alternative approach
5. Return: what was fixed, which files changed, test result after fix`
})
- Re-run
testConfig.command→ updatetestChecklist.execution - Write updated
test-checklist.json - Break if tests pass; continue if still failing
If still failing after 3 iterations → log "Manual investigation needed"
// TodoWrite: Phase 4 → completed, Phase 5 → in_progress
TR-Phase 5: Report & Sync
CHECKPOINT: This step is MANDATORY. Always generate report and trigger sync.
Generate test-review.md with sections:
- Header: session, summary, timestamp, framework
- Task Verdicts table: task_id | status | convergence (met/total) | test_items | gaps
- Unmet Criteria: per-task checklist of unmet items
- Test Gaps: list of missing unit/integration tests
- Test Execution: command, result, fix iteration (if applicable)
Write ${sessionPath}/test-review.md
Chain to session:sync:
Skill({ skill: "workflow:session:sync", args: `-y "Test review: ${testChecklist.execution?.overall || 'no-test'} — ${plan.summary}"` })
// TodoWrite: Phase 5 → completed
Display summary: Per-task verdict with [PASS]/[PARTIAL]/[FAIL] icons, convergence ratio, overall test result.
Data Structures
testReviewContext (Input - Mode 1, set by workflow-lite-execute Step 5)
{
planObject: { /* same as executionContext.planObject */ },
taskFiles: [{ id: string, path: string }],
convergenceReviewTool: "skip" | "agent" | "gemini" | "codex",
executionResults: [...],
originalUserInput: string,
session: {
id: string,
folder: string,
artifacts: { plan: string, task_dir: string }
}
}
testChecklist (Output artifact)
{
session: string,
plan_summary: string,
generated_at: string,
test_config: { command, framework, type },
tasks: [{
task_id: string,
title: string,
status: "PASS" | "PARTIAL" | "FAIL",
convergence: { met: string[], unmet: string[] },
test_items: [{ type: "unit"|"integration"|"metric", desc: string, status: "pending"|"missing" }]
}],
execution: {
command: string,
timestamp: string,
raw_output: string, // last 3000 chars
overall: "PASS" | "FAIL" | "UNKNOWN",
fix_iteration?: number
} | null
}
Session Folder Structure (after test-review)
.workflow/.lite-plan/{session-id}/
├── exploration-*.json
├── explorations-manifest.json
├── planning-context.md
├── plan.json
├── .task/TASK-*.json
├── test-checklist.json # structured test results
└── test-review.md # human-readable report
Error Handling
| Error | Resolution |
|---|---|
| No session found | "No workflow-lite-plan sessions found. Run workflow-lite-plan first." |
| Missing plan.json | "Invalid session: missing plan.json at {path}" |
| No test framework | Skip TR-Phase 3 execution, still generate review report |
| Test timeout | Capture partial output, report as FAIL |
| Fix agent fails | Log iteration, continue to next or stop at max |
| Sync fails | Log warning, do not block report generation |
More from catlog22/claude-code-workflow
skill-generator
Meta-skill for creating new Claude Code skills with configurable execution modes. Supports sequential (fixed order) and autonomous (stateless) phase patterns. Use for skill scaffolding, skill creation, or building new workflows. Triggers on "create skill", "new skill", "skill generator".
127review-code
Multi-dimensional code review with structured reports. Analyzes correctness, readability, performance, security, testing, and architecture. Triggers on "review code", "code review", "审查代码", "代码审查".
102skill-tuning
Universal skill diagnosis and optimization tool. Detect and fix skill execution issues including context explosion, long-tail forgetting, data flow disruption, and agent coordination failures. Supports Gemini CLI for deep analysis. Triggers on "skill tuning", "tune skill", "skill diagnosis", "optimize skill", "skill debug".
71compact
Compact current session memory into structured text for session recovery. Supports custom descriptions and tagging.
71issue-manage
Interactive issue management with menu-driven CRUD operations. Use when managing issues, viewing issue status, editing issue fields, performing bulk operations, or viewing issue history. Triggers on "manage issue", "list issues", "edit issue", "delete issue", "bulk update", "issue dashboard", "issue history", "completed issues".
71ccw-help
CCW command help system. Search, browse, recommend commands, skills, teams. Triggers "ccw-help", "ccw-issue".
70