moai-workflow-gan-loop
moai-workflow-gan-loop
Implements the Builder-Evaluator GAN loop for iterative design quality improvement. Absorbed from agency constitution Section 11 and Section 12. Integrates Sprint Contract Protocol, 4-dimension scoring, stagnation detection, and Evaluator Leniency Prevention.
All loop parameters are read from .moai/config/sections/design.yaml. Do not hardcode thresholds.
Quick Reference
Loop Parameters (from design.yaml)
design.gan_loop:
max_iterations: 5 # Maximum Builder-Evaluator cycles
pass_threshold: 0.75 # Score >= this value to exit loop
escalation_after: 3 # Escalate to user after N iterations without passing
improvement_threshold: 0.05 # Minimum score delta per iteration
strict_mode: false # If true, each dimension must pass individually
sprint_contract:
enabled: true
required_harness_levels: [thorough]
optional_harness_levels: [standard]
artifact_dir: ".moai/sprints"
max_negotiation_rounds: 2
4-Dimension Scoring Weights
| Dimension | Weight | Description |
|---|---|---|
| Design Quality | 30% | Visual consistency, brand token compliance, WCAG AA |
| Originality | 25% | Not generic, not AI-slop, unique brand expression |
| Completeness | 25% | All BRIEF sections present, copy matches contract |
| Functionality | 20% | Responsive, accessible, all interactions work |
Overall score = weighted average of all four dimensions.
Pass condition: overall_score >= pass_threshold AND (if strict_mode: true) each dimension score >= pass_threshold.
Implementation Guide
GAN Loop Execution Flow
Phase 1: Sprint Contract (when required by harness level)
Required when harness_level == thorough.
Optional when harness_level == standard and user opts in.
Skipped when harness_level == minimal.
Sprint Contract generation:
- Evaluator analyzes the BRIEF document and current iteration scope.
- Evaluator produces the Sprint Contract document:
acceptance_checklist: concrete, testable criteria for this iterationpriority_dimension: which of the 4 dimensions to focus ontest_scenarios: specific verification stepspass_conditions: minimum score per criterion
- Builder reviews the contract:
- Accept: proceed with implementation
- Request adjustment: propose alternatives (max
max_negotiation_roundsrounds)
- Contract is saved to
design.gan_loop.sprint_contract.artifact_dir/sprint-N.json
Constraint: Evaluator must not score on criteria outside the Sprint Contract. Builder must not claim criteria as met without evidence.
Phase 2: Builder Execution
Builder implements based on:
- Accepted Sprint Contract (if present)
- BRIEF document
- Copy JSON from
moai-domain-copywriting - Design tokens from
moai-domain-brand-designormoai-workflow-design-import
Builder outputs: code files, rendered previews (if Playwright available), implementation notes.
Phase 3: Evaluator Scoring
Evaluator scores against the 4 dimensions using the Evaluator Leniency Prevention mechanisms:
- Rubric Anchoring: Score each dimension against the rubric (0.25 increments) with explicit justification. Scores without rubric reference are invalid.
- Evidence-Only Verdicts: No PASS without concrete evidence (screenshot, test output, code reference).
- Anti-Pattern Cross-check: Check known anti-patterns before finalizing. Any detected anti-pattern caps the relevant dimension score at 0.50.
- Must-Pass Firewall: Copy integrity, mobile viewport, and WCAG AA are must-pass criteria. Failure in any must-pass = overall FAIL regardless of other scores.
Output: evaluation-report-N.json in sprint_contract.artifact_dir.
Phase 4: Loop Decision
if overall_score >= pass_threshold:
EXIT LOOP → proceed to next phase
elif iteration >= max_iterations:
ESCALATE → present failure report to user
elif stagnation_detected:
ESCALATE → present stagnation options
else:
ITERATE → pass feedback to Builder, increment N
Phase 5: Iteration Feedback
If looping back:
- Evaluator generates targeted feedback per failed criterion.
- Builder receives the feedback and previous Sprint Contract.
- Previously passed criteria carry forward (no regression allowed).
- New Sprint Contract is generated for failed criteria only.
Stagnation Detection
Stagnation is detected when the score improvement between consecutive iterations is below improvement_threshold for 2 or more iterations.
Tracking:
- After each iteration, record
{iteration: N, score: X}in the sprint artifact. - Calculate
delta = score[N] - score[N-1]. - If
delta < improvement_thresholdfor the last 2 iterations, flag stagnation.
When stagnation is detected, escalate to user via AskUserQuestion with three options:
- Continue with current approach (Evaluator tries a different dimension focus)
- Adjust criteria (user provides guidance or relaxes constraints)
- Abort loop (accept current output as-is)
The escalation trigger at escalation_after iterations applies independently: if 3 iterations pass without a PASS score, escalate regardless of stagnation state.
Evaluator Leniency Prevention Mechanisms
The following 5 mechanisms prevent score inflation and must be applied on every evaluation:
Mechanism 1: Rubric Anchoring
Score descriptions for each dimension:
- 0.25: Major defects, fails most criteria
- 0.50: Partial compliance, notable issues remain
- 0.75: Solid compliance, minor issues only
- 1.00: Full compliance, no issues found
Always state which rubric level applies and why before assigning a numeric score.
Mechanism 2: Must-Pass Firewall
The following conditions cause immediate FAIL regardless of other scores:
- Copy text differs from the original
copy.jsonor BRIEF copy section - AI slop detected: purple gradient (#8B5CF6-#6D28D9) as primary visual element with generic white cards
- Mobile viewport broken at 375px width (content overflow, unreadable text)
- Any interactive element returns 404 or broken state
- Lighthouse Accessibility < 80
Mechanism 3: Anti-Pattern Penalty
Known anti-patterns that cap dimension score at 0.50:
- Generic icon set without brand customization (Originality capped)
- Hard-coded spacing values outside the design token scale (Design Quality capped)
- Missing
altattributes on non-decorative images (Functionality capped) - Section copy that does not match the contracted copy (Completeness capped)
Mechanism 4: Evidence Requirement
Each dimension score must cite specific evidence:
- Design Quality: Reference token file path and WCAG contrast ratio
- Originality: Describe what makes the design non-generic
- Completeness: List each BRIEF section and its implementation status
- Functionality: Reference test result or Playwright output
Mechanism 5: Regression Baseline
If a previous iteration passed a criterion, the current iteration must maintain that criterion. Regression from a previously passed criterion triggers an automatic score reduction in the relevant dimension.
Sprint Contract Structure
Sprint Contract document format (sprint-N.json):
{
"sprint_id": "sprint-N",
"iteration": N,
"priority_dimension": "Design Quality | Originality | Completeness | Functionality",
"acceptance_checklist": [
{
"id": "AC-01",
"criterion": "Hero headline contrast ratio >= 4.5:1",
"verification": "Check color pair with contrast calculator",
"status": "pending | passed | failed"
}
],
"test_scenarios": [
{
"id": "TS-01",
"description": "Mobile viewport renders without horizontal scroll",
"tool": "Playwright | visual inspection",
"command": "playwright test --viewport 375x667"
}
],
"pass_conditions": {
"Design Quality": 0.75,
"Originality": 0.70,
"Completeness": 0.80,
"Functionality": 0.75
},
"negotiation_history": [],
"created_at": "ISO-8601"
}
Advanced Patterns
Strict Mode
When strict_mode: true in design.yaml:
- Each of the 4 dimension scores must individually meet
pass_threshold. - The weighted average alone is not sufficient.
- Minimum 2 iterations required even if the first iteration achieves a passing weighted average.
- Strict mode is recommended for client-facing deliverables.
Independent Re-evaluation
Every 5th project triggers an independent re-evaluation:
- The same build is scored twice with independent prompts.
- If scores diverge by more than 0.10, a calibration warning is logged.
- Calibration results are stored in
sprint_contract.artifact_dir/calibration-log.json.
Playwright Integration
When claude-in-chrome MCP or Playwright is available, the Evaluator uses automated testing:
- Desktop screenshot (1280x720): full page
- Mobile screenshot (375x667): full page
- Interaction test: click all CTAs, verify no 404
- Accessibility scan: automated WCAG check
When testing tools are unavailable, fall back to static code analysis only, and note the limitation in the evaluation report.
Works Well With
moai-domain-brand-design: Provides design tokens that Evaluator validates in Design Quality dimensionmoai-domain-copywriting: Copy JSON is the reference for Completeness dimensionevaluator-active: The GAN loop orchestrates evaluator-active for each scoring passmoai-workflow-design-import: Extracted tokens serve as the design reference baseline
Source: Absorbed from agency constitution (Section 11 GAN Loop Contract, Section 12 Evaluator Leniency Prevention) on 2026-04-20. REQ coverage: REQ-SKILL-011, REQ-SKILL-012, REQ-SKILL-012a, REQ-SKILL-013, REQ-SKILL-014, REQ-CONST-004 Version: 1.0.0