iterate
MANDATORY PREPARATION
Invoke /agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run /teach-maestro first.
Consult the feedback-loops reference in the agent-workflow skill for evaluation patterns and self-correction strategies.
Set up feedback loops that make workflows self-correcting and continuously improving. Iteration transforms one-shot gambles into convergent, reliable systems.
Feedback Loop Design
Step 1: Define Quality Criteria
What does "good output" look like? Score dimensions:
| Dimension | Weight | Threshold | Measurement |
|---|---|---|---|
| Accuracy | 0.4 | ≥ 0.8 | Factual correctness check |
| Completeness | 0.3 | ≥ 0.7 | Required fields present |
| Format | 0.2 | ≥ 0.9 | Schema compliance |
| Tone | 0.1 | ≥ 0.6 | Appropriate for audience |
Step 2: Choose Evaluator Type
Match evaluator to requirements:
- Rule-based: Schema validation, field presence, value ranges (fast, free)
- Self-check: Same model evaluates own output (fast, cheap, less reliable)
- Cross-model: Different model evaluates (slower, more reliable)
- Human-in-the-loop: Human review (slowest, most reliable, doesn't scale)
- Hybrid: Rules first, then model check for what rules can't catch
Step 3: Design the Correction Loop
generate(input) → evaluate(output) → score
if score ≥ threshold → return output
if score < threshold AND attempts < max →
enrich input with evaluator feedback
generate again (with feedback)
if attempts ≥ max → fallback or escalate
Critical: The retry input MUST be different from the original. Include:
- The evaluator's specific feedback
- What was wrong and why
- A suggestion for how to fix it
Step 4: Set Up Regression Detection
When changing prompts, models, or tools:
- Run golden test set with OLD config → baseline scores
- Run golden test set with NEW config → new scores
- Compare: improvement ≥ 5% → accept; regression ≥ 5% → reject
Step 5: Continuous Monitoring
For production workflows:
- Sample 1-5% of outputs for automated evaluation
- Track quality scores over time
- Alert on downward trends
- A/B test changes before full rollout
Iteration Checklist
- Quality criteria defined with weights and thresholds
- Evaluator selected and configured
- Correction loop has max attempts limit
- Feedback is injected into retries (not identical retry)
- Golden test set exists with ≥ 10 cases
- Regression detection configured for changes
- Production monitoring in place
Recommended Next Step
After setting up feedback loops, run /evaluate to validate the loop with real scenarios, then /refine for final polish.
NEVER:
- Retry with the exact same input (definition of insanity)
- Use the same weak model to both generate and evaluate
- Skip the max attempts limit (infinite loops are real)
- Deploy changes without regression testing against golden set
- Monitor only errors — track quality scores over time
More from sharpdeveye/maestro
agent-workflow
Use when any Maestro command is invoked — provides foundational workflow design principles across prompt engineering, context management, tool orchestration, agent architecture, feedback loops, knowledge systems, and guardrails.
133diagnose
Use when the user wants to find problems, audit workflow quality, or get a comprehensive health check on their AI workflow.
131evaluate
Use when the user wants a quality review, interaction audit, or to test the workflow against realistic scenarios.
130calibrate
Use when workflow components are inconsistent, naming conventions vary, or a new team member's work needs alignment to project standards.
125fortify
Use when the workflow lacks error handling, has been failing in production, or needs retry logic, fallback strategies, and circuit breakers.
125streamline
Use when the workflow feels too complex, has accumulated cruft, or has redundant steps and overlapping tools that need consolidation.
125