Auto Mode

No auto mode -- UAT is inherently interactive. --auto-fix only automates gap closure, not test execution.

Test (UAT)

Usage

$quality-test "3"                    # test phase 3
$quality-test "3 --smoke"            # smoke tests first, then UAT
$quality-test "3 --auto-fix"         # auto-trigger gap-fix loop on failures
$quality-test "--session 04-comments"  # resume specific session

Flags:

<phase>: Phase number or scratch task ID
--smoke: Run cold-start smoke tests before UAT
--auto-fix: Auto-trigger gap-fix loop (plan --gaps -> execute -> re-verify) on failures
--session ID: Resume a specific UAT session

Output: {target_dir}/uat.md + .tests/test-plan.json + .tests/test-results.json + .tests/coverage-report.json

Overview

Conversational UAT: present expected behavior one test at a time, user confirms or describes issues. Severity inferred from natural language (never asked). Session persists in uat.md across context resets. Failed tests trigger parallel debug agent diagnosis and optional gap-fix closure.

Philosophy: Show expected, ask if reality matches.

Implementation

Step 1: Resolve Target

Parse $ARGUMENTS for phase number, scratch task ID, or flags
Phase mode: set PHASE_DIR = .workflow/phases/{NN}-{slug}/
Scratch mode: set SCRATCH_DIR = .workflow/scratch/{id}/
Validate target exists and has verification.json -- if missing: E002

Step 2: Check Active Sessions

find .workflow/phases -name "uat.md" -type f 2>/dev/null | head -5
find .workflow/scratch -name "uat.md" -type f 2>/dev/null | head -5

If active sessions exist and no target specified: display session table, ask user to resume or start new
If --session ID specified: resume that session directly (skip to Step 9)
If session exists for target: offer resume or restart

Step 3: Smoke Tests (if --smoke)

Run basic sanity checks (app starts, routes respond, build clean, deps installed). If any smoke fails: E003 -- abort, suggest Skill({ skill: "quality-debug" })

Step 4: Load Verification Context

Read from target directory: verification.json, validation.json, index.json, plan.json, .summaries/TASK-*.md. Build testable list from user-observable outcomes.

Step 5: Design Test Scenarios

Create scenarios from testables (id T-001, name, category, expected behavior, requirement_ref). Focus on USER-OBSERVABLE outcomes. Write {target_dir}/.tests/test-plan.json.

Step 6: Create UAT File

Archive previous uat.md to .history/ if exists. Write {target_dir}/uat.md with frontmatter (status, target, started), Current Test section, Tests section (all pending), Summary counters, empty Gaps section.

Step 7: Present Test (Interactive Loop)

Present one test at a time:

------------------------------------------------------------
  TEST {number}/{total}: {name}
------------------------------------------------------------

Expected behavior:
{expected}

------------------------------------------------------------
> Type "pass" or describe what's wrong
------------------------------------------------------------

Wait for user response (plain text).

Step 8: Process Response

Response	Action
empty, "yes", "y", "ok", "pass", "next"	Mark as pass
"skip", "can't test", "n/a"	Mark as skipped
Anything else	Log as issue, infer severity

Severity inference (never ask):

"crashes", "error", "fails completely" -> blocker
"doesn't work", "wrong behavior", "broken" -> major
"works but...", "slow", "minor issue" -> minor
"color", "spacing", "typo" -> cosmetic
Default: major

On issue: auto-create issue in .workflow/issues/issues.jsonl with back-reference.

Batched writes: write to file on issue, every 5 passes, or completion.

If more tests: update Current Test, loop to Step 7. If done: go to Step 10.

Step 9: Resume From File

Read uat.md, find first result: [pending] test, announce progress, continue from there (go to Step 7).

Step 10: Complete Session

Update uat.md frontmatter: status -> "complete"
Archive previous result artifacts to .history/
Write .tests/test-results.json and .tests/coverage-report.json
Update index.json with UAT results
If no issues: go to Step 13
If issues found: go to Step 11

Step 11: Auto-Diagnose

Cluster related gaps by component/area. Spawn one debug Agent per cluster:

Agent({
  subagent_type: "general-purpose",
  description: "Diagnose UAT gap cluster: {cluster_name}",
  prompt: "Investigate UAT failures. Gaps: {gap list}. Find root cause, fix direction, affected files, evidence (file:line).",
  run_in_background: false
})

Update uat.md gaps with diagnosis results (root_cause, fix_direction, affected_files).

Step 12: Gap Closure Decision

If --auto-fix: execute gap-fix loop directly.

Otherwise: present diagnosis summary and offer options:

Auto-fix (plan --gaps -> execute -> re-verify, max 2 iterations)
Debug deep -- Skill({ skill: "quality-debug" })
Plan fixes -- Skill({ skill: "maestro-plan", args: "--gaps" })
Manual fix

Update issue lifecycle during gap-fix loop (registered -> planning -> executing -> completed/failed).

Step 13: Report

=== UAT RESULTS ===
Target:      {target}
Smoke Tests: {smoke_count} run, {smoke_pass} passed
UAT Tests:   {total} total
  Passed:    {passed}
  Issues:    {issues} ({blocker_count} blockers, {major_count} major)
  Skipped:   {skipped}
Diagnosis:   {diagnosed_count}/{issues} gaps diagnosed
Auto-fix:    {fixed_count} gaps resolved

Next steps:
  {suggested_next_command}

Error Handling

Code	Severity	Condition	Recovery
E001	error	Phase or task target required	Prompt user for phase number
E002	error	Phase not verified (no verification.json)	Suggest Skill({ skill: "maestro-verify" })
E003	error	Smoke test failed (app won't start)	Suggest Skill({ skill: "quality-debug" })
W001	warning	Test scenarios failed	Auto-diagnose, suggest fix options
W002	warning	Coverage below threshold	Suggest Skill({ skill: "quality-test-gen" })

Core Rules

One test at a time -- never batch-present tests
Never ask severity -- always infer from natural language
Session persistence -- uat.md survives context resets, resume from any point
Batched writes -- minimize file I/O (on issue, every 5 passes, completion)
Gap-fix loop max 2 iterations -- prevent infinite loops
Agent calls use run_in_background: false for synchronous execution
Auto-create issues in .workflow/issues/issues.jsonl for every failed test

quality-test