evaluator-optimizer

Installation

SKILL.md

Evaluator-Optimizer

Anthropic canonical agent pattern. Generate. Judge. Revise. Repeat until good enough.

What It Is

One component makes candidate output. Another component evaluates it. Feedback drives revision. Loop ends on pass, budget, or max rounds.

When to Use

Quality bar is clear
Output can improve with feedback
Eval criteria can be written or coded
Failure is costly
Extra loop cost is acceptable

When Not to Use

No reliable eval signal
Feedback vague or subjective
Fast answer matters more than polished answer
Output space huge and revisions drift
Humans should review instead of model judge

Core Flow

candidate
  → evaluator scores against rubric
  → pass ? return
  → optimizer revises using feedback
  → re-evaluate

Simple Implementation Outline

Define pass/fail rubric.
Separate generator and evaluator prompts.
Keep evaluator stricter than generator.
Return score + reasons + fix hints.
Limit revision rounds.
Stop on plateau.
Save failing examples for rubric tuning.

Good Eval Signals

Schema validity
Test pass rate
Grounding to source facts
Style or policy compliance
Ranking score

Failure Modes

Evaluator too soft. Bad outputs pass.
Evaluator and optimizer share same blind spots.
Feedback vague. Revisions random.
Loop overfits rubric, hurts real quality.
No stop rule. Cost climbs.
Generator never sees concrete failure examples.

Practical Checklist

Rubric explicit
Pass threshold set
Generator and evaluator separated
Feedback actionable
Max rounds set
Plateau stop rule set
Offline eval set exists
Human review path for high-risk cases

Decision Rule

Use evaluator-optimizer when you can state quality bar clearly and iterate cheaply. If you cannot judge quality well, do not build this loop.

Related skills

More from flpbalada/fb-skills

Installs

3

Repository

flpbalada/fb-skills

GitHub Stars

1

First Seen

4 days ago

Security Audits

Gen Agent Trust HubPass