recursive-improvement
Recursive Self-Improvement Loop
A pattern for generating higher-quality output by iterating against explicit scoring criteria.
The Pattern
generate → evaluate → diagnose → improve → repeat (until passing)
Never ship first-draft output for important content. Run the loop.
How It Works
1. Generate
Create the initial output as you normally would.
2. Evaluate
Score the output against each criterion (1-10). Be brutally honest.
3. Diagnose
For any criterion scoring below threshold:
- What specifically is weak?
- Why does it fail?
- What would "passing" look like?
4. Improve
Rewrite addressing each diagnosed weakness. Don't patch — rebuild the weak sections.
5. Repeat
Re-evaluate. Keep looping until all criteria pass threshold (usually 8/10 minimum).
Adversarial Pressure (Optional but Powerful)
After passing criteria, attack the output from a hostile perspective:
- Skeptical customer: "Why should I believe this? What's the catch?"
- Distracted scroller: "Would I stop for this? In 2 seconds?"
- Competitor: "How would a rival tear this apart?"
If it survives, ship it. If not, iterate.
Example Criteria by Use Case
Social Content
| Criterion | What to evaluate |
|---|---|
| Hook strength | First line grabs attention? Pattern interrupt? |
| Curiosity gap | Creates urge to keep reading? |
| Clarity | One clear idea? No confusion? |
| Voice match | Sounds like the target voice/brand? |
| Engagement potential | People will reply/share/save? |
| Thumb-stop power | Scroller would pause? |
| Value density | Every line earns its place? |
| CTA clarity | Clear what reader should do next? |
Adversarial test: Would a distracted, skeptical user at 11pm engage with this?
Landing Page / Web Copy
| Criterion | What to evaluate |
|---|---|
| Headline clarity | Instantly clear what this business does? |
| Value prop strength | Why choose them over competitors? |
| Benefit focus | Features translated to customer benefits? |
| CTA effectiveness | Clear, compelling action? Low friction? |
| Trust signals | Credibility established? Social proof? |
| Readability | Scannable? Short paragraphs? Clear hierarchy? |
| Objection handling | Common concerns addressed? |
| Specificity | Concrete details vs vague claims? |
Adversarial test: Would someone searching on their phone take action within 30 seconds?
Email Copy
| Criterion | What to evaluate |
|---|---|
| Subject line | Would this get opened? Stands out in inbox? |
| Opening hook | First sentence earns the second? |
| Single focus | One clear ask per email? |
| Skimmability | Can get the gist in 5 seconds? |
| CTA prominence | Action is obvious and easy? |
| Voice consistency | Matches brand/sender personality? |
| Length appropriate | No fluff, nothing missing? |
| Mobile friendly | Works on small screens? |
Adversarial test: Would a busy person with 200 unread emails act on this?
Ad Copy
| Criterion | What to evaluate |
|---|---|
| Thumb-stop power | Pattern interrupt in first 2 seconds? |
| Curiosity gap | Creates need to know more? |
| Emotional trigger | Hits a real pain point or desire? |
| Credibility | Believable? Not too good to be true? |
| CTA strength | Clear next step with low friction? |
| Persona match | Speaks directly to target audience? |
| Differentiation | Stands out from competitor ads? |
| Platform native | Fits the platform's style/format? |
Adversarial test: Would this stop YOUR scroll? Would you click?
When to Use
Always use for:
- Headlines and hooks
- CTAs and value props
- Key landing page sections
- Social posts (especially threads)
- Ad copy
- Important emails
Can skip for:
- Internal notes
- First-pass brainstorming
- Technical documentation
- Boilerplate content
Building Your Own Criteria
- Pick one task you do repeatedly
- Write down how YOU evaluate that output — what makes "good" vs "mid"?
- Turn each into a pass/fail threshold — be specific ("9/10 minimum" not "make it good")
- Add adversarial pressure — who would attack this? What would they say?
- Save and reuse — now you have a system, not just a prompt
Quick Loop Template
## Output v1
[Initial generation]
## Evaluation v1
- Hook strength: 6/10 — Opens weak, no pattern interrupt
- Clarity: 8/10 — Clear enough
- Voice match: 7/10 — Too formal
[... score all criteria]
## Diagnosis
1. Hook needs a surprising stat or contrarian take
2. Voice should be more casual, shorter sentences
3. [...]
## Output v2
[Revised version addressing weaknesses]
## Evaluation v2
[Re-score — continue until all pass]
The loop typically adds 2-3 iterations. Worth it for anything that matters.