moai-workflow-gan-loop

Implements the Builder-Evaluator GAN loop for iterative design quality improvement. Absorbed from agency constitution Section 11 and Section 12. Integrates Sprint Contract Protocol, 4-dimension scoring, stagnation detection, and Evaluator Leniency Prevention.

All loop parameters are read from .moai/config/sections/design.yaml. Do not hardcode thresholds.

Quick Reference

Loop Parameters (from design.yaml)

design.gan_loop:
  max_iterations: 5          # Maximum Builder-Evaluator cycles
  pass_threshold: 0.75       # Score >= this value to exit loop
  escalation_after: 3        # Escalate to user after N iterations without passing
  improvement_threshold: 0.05  # Minimum score delta per iteration
  strict_mode: false         # If true, each dimension must pass individually
  sprint_contract:
    enabled: true
    required_harness_levels: [thorough]
    optional_harness_levels: [standard]
    artifact_dir: ".moai/sprints"
    max_negotiation_rounds: 2

4-Dimension Scoring Weights

Dimension	Weight	Description
Design Quality	30%	Visual consistency, brand token compliance, WCAG AA
Originality	25%	Not generic, not AI-slop, unique brand expression
Completeness	25%	All BRIEF sections present, copy matches contract
Functionality	20%	Responsive, accessible, all interactions work

Overall score = weighted average of all four dimensions.

Pass condition: overall_score >= pass_threshold AND (if strict_mode: true) each dimension score >= pass_threshold.

Implementation Guide

GAN Loop Execution Flow

Phase 1: Sprint Contract (when required by harness level)

Required when harness_level == thorough. Optional when harness_level == standard and user opts in. Skipped when harness_level == minimal.

Sprint Contract generation:

Evaluator analyzes the BRIEF document and current iteration scope.
Evaluator produces the Sprint Contract document:
- acceptance_checklist: concrete, testable criteria for this iteration
- priority_dimension: which of the 4 dimensions to focus on
- test_scenarios: specific verification steps
- pass_conditions: minimum score per criterion
Builder reviews the contract:
- Accept: proceed with implementation
- Request adjustment: propose alternatives (max max_negotiation_rounds rounds)
Contract is saved to design.gan_loop.sprint_contract.artifact_dir/sprint-N.json

Constraint: Evaluator must not score on criteria outside the Sprint Contract. Builder must not claim criteria as met without evidence.

Phase 2: Builder Execution

Builder implements based on:

Accepted Sprint Contract (if present)
BRIEF document
Copy JSON from moai-domain-copywriting
Design tokens from moai-domain-brand-design or moai-workflow-design-import

Builder outputs: code files, rendered previews (if Playwright available), implementation notes.

Phase 3: Evaluator Scoring

Evaluator scores against the 4 dimensions using the Evaluator Leniency Prevention mechanisms:

Rubric Anchoring: Score each dimension against the rubric (0.25 increments) with explicit justification. Scores without rubric reference are invalid.
Evidence-Only Verdicts: No PASS without concrete evidence (screenshot, test output, code reference).
Anti-Pattern Cross-check: Check known anti-patterns before finalizing. Any detected anti-pattern caps the relevant dimension score at 0.50.
Must-Pass Firewall: Copy integrity, mobile viewport, and WCAG AA are must-pass criteria. Failure in any must-pass = overall FAIL regardless of other scores.

Output: evaluation-report-N.json in sprint_contract.artifact_dir.

Phase 4: Loop Decision

if overall_score >= pass_threshold:
    EXIT LOOP → proceed to next phase
elif iteration >= max_iterations:
    ESCALATE → present failure report to user
elif stagnation_detected:
    ESCALATE → present stagnation options
else:
    ITERATE → pass feedback to Builder, increment N

Phase 5: Iteration Feedback

If looping back:

Evaluator generates targeted feedback per failed criterion.
Builder receives the feedback and previous Sprint Contract.
Previously passed criteria carry forward (no regression allowed).
New Sprint Contract is generated for failed criteria only.

Stagnation Detection

Stagnation is detected when the score improvement between consecutive iterations is below improvement_threshold for 2 or more iterations.

Tracking:

After each iteration, record {iteration: N, score: X} in the sprint artifact.
Calculate delta = score[N] - score[N-1].
If delta < improvement_threshold for the last 2 iterations, flag stagnation.

When stagnation is detected, escalate to user via AskUserQuestion with three options:

Continue with current approach (Evaluator tries a different dimension focus)
Adjust criteria (user provides guidance or relaxes constraints)
Abort loop (accept current output as-is)

The escalation trigger at escalation_after iterations applies independently: if 3 iterations pass without a PASS score, escalate regardless of stagnation state.

Evaluator Leniency Prevention Mechanisms

The following 5 mechanisms prevent score inflation and must be applied on every evaluation:

Mechanism 1: Rubric Anchoring

Score descriptions for each dimension:

0.25: Major defects, fails most criteria
0.50: Partial compliance, notable issues remain
0.75: Solid compliance, minor issues only
1.00: Full compliance, no issues found

Always state which rubric level applies and why before assigning a numeric score.

Mechanism 2: Must-Pass Firewall

The following conditions cause immediate FAIL regardless of other scores:

Copy text differs from the original copy.json or BRIEF copy section
AI slop detected: purple gradient (#8B5CF6-#6D28D9) as primary visual element with generic white cards
Mobile viewport broken at 375px width (content overflow, unreadable text)
Any interactive element returns 404 or broken state
Lighthouse Accessibility < 80

Mechanism 3: Anti-Pattern Penalty

Known anti-patterns that cap dimension score at 0.50:

Generic icon set without brand customization (Originality capped)
Hard-coded spacing values outside the design token scale (Design Quality capped)
Missing alt attributes on non-decorative images (Functionality capped)
Section copy that does not match the contracted copy (Completeness capped)

Mechanism 4: Evidence Requirement

Each dimension score must cite specific evidence:

Design Quality: Reference token file path and WCAG contrast ratio
Originality: Describe what makes the design non-generic
Completeness: List each BRIEF section and its implementation status
Functionality: Reference test result or Playwright output

Mechanism 5: Regression Baseline

If a previous iteration passed a criterion, the current iteration must maintain that criterion. Regression from a previously passed criterion triggers an automatic score reduction in the relevant dimension.

Sprint Contract Structure

Sprint Contract document format (sprint-N.json):

{
  "sprint_id": "sprint-N",
  "iteration": N,
  "priority_dimension": "Design Quality | Originality | Completeness | Functionality",
  "acceptance_checklist": [
    {
      "id": "AC-01",
      "criterion": "Hero headline contrast ratio >= 4.5:1",
      "verification": "Check color pair with contrast calculator",
      "status": "pending | passed | failed"
    }
  ],
  "test_scenarios": [
    {
      "id": "TS-01",
      "description": "Mobile viewport renders without horizontal scroll",
      "tool": "Playwright | visual inspection",
      "command": "playwright test --viewport 375x667"
    }
  ],
  "pass_conditions": {
    "Design Quality": 0.75,
    "Originality": 0.70,
    "Completeness": 0.80,
    "Functionality": 0.75
  },
  "negotiation_history": [],
  "created_at": "ISO-8601"
}

Advanced Patterns

Strict Mode

When strict_mode: true in design.yaml:

Each of the 4 dimension scores must individually meet pass_threshold.
The weighted average alone is not sufficient.
Minimum 2 iterations required even if the first iteration achieves a passing weighted average.
Strict mode is recommended for client-facing deliverables.

Independent Re-evaluation

Every 5th project triggers an independent re-evaluation:

The same build is scored twice with independent prompts.
If scores diverge by more than 0.10, a calibration warning is logged.
Calibration results are stored in sprint_contract.artifact_dir/calibration-log.json.

Playwright Integration

When claude-in-chrome MCP or Playwright is available, the Evaluator uses automated testing:

Desktop screenshot (1280x720): full page
Mobile screenshot (375x667): full page
Interaction test: click all CTAs, verify no 404
Accessibility scan: automated WCAG check

When testing tools are unavailable, fall back to static code analysis only, and note the limitation in the evaluation report.

Works Well With

moai-domain-brand-design: Provides design tokens that Evaluator validates in Design Quality dimension
moai-domain-copywriting: Copy JSON is the reference for Completeness dimension
evaluator-active: The GAN loop orchestrates evaluator-active for each scoring pass
moai-workflow-design-import: Extracted tokens serve as the design reference baseline

Source: Absorbed from agency constitution (Section 11 GAN Loop Contract, Section 12 Evaluator Leniency Prevention) on 2026-04-20. REQ coverage: REQ-SKILL-011, REQ-SKILL-012, REQ-SKILL-012a, REQ-SKILL-013, REQ-SKILL-014, REQ-CONST-004 Version: 1.0.0