evaluator-optimizer
Evaluator-Optimizer
Anthropic canonical agent pattern. Generate. Judge. Revise. Repeat until good enough.
What It Is
One component makes candidate output. Another component evaluates it. Feedback drives revision. Loop ends on pass, budget, or max rounds.
When to Use
- Quality bar is clear
- Output can improve with feedback
- Eval criteria can be written or coded
- Failure is costly
- Extra loop cost is acceptable
When Not to Use
- No reliable eval signal
- Feedback vague or subjective
- Fast answer matters more than polished answer
- Output space huge and revisions drift
- Humans should review instead of model judge
Core Flow
candidate
→ evaluator scores against rubric
→ pass ? return
→ optimizer revises using feedback
→ re-evaluate
Simple Implementation Outline
- Define pass/fail rubric.
- Separate generator and evaluator prompts.
- Keep evaluator stricter than generator.
- Return score + reasons + fix hints.
- Limit revision rounds.
- Stop on plateau.
- Save failing examples for rubric tuning.
Good Eval Signals
- Schema validity
- Test pass rate
- Grounding to source facts
- Style or policy compliance
- Ranking score
Failure Modes
- Evaluator too soft. Bad outputs pass.
- Evaluator and optimizer share same blind spots.
- Feedback vague. Revisions random.
- Loop overfits rubric, hurts real quality.
- No stop rule. Cost climbs.
- Generator never sees concrete failure examples.
Practical Checklist
- Rubric explicit
- Pass threshold set
- Generator and evaluator separated
- Feedback actionable
- Max rounds set
- Plateau stop rule set
- Offline eval set exists
- Human review path for high-risk cases
Decision Rule
Use evaluator-optimizer when you can state quality bar clearly and iterate cheaply. If you cannot judge quality well, do not build this loop.
More from flpbalada/fb-skills
discuss-task
Clarify ambiguous tasks before action. Use when goal, scope, success criteria, constraints, or risks are unclear.
4cognitive-fluency-psychology
Apply cognitive fluency principles to improve clarity, trust, and conversion.
4react-useeffect-avoid
Guides when NOT to use useEffect and suggests better alternatives. Use when reviewing React code, troubleshooting performance, or considering useEffect for derived state or form resets.
4discuss-code
Critically discuss code issues with compact findings. Use when code needs review for logic, simplicity, structure, naming, or maintainability.
4learn
Extract reusable patterns from the current session. Use when errors, debugging techniques, workarounds, or project conventions should become skills.
3kaizen
Apply Kaizen continuous improvement methodology. Use when optimizing
3