autoresearch
Autoresearch Skill
Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.
When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.
When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.
The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.
What You'll Produce
Every run outputs 3 files:
| File | Purpose |
|---|---|
{name}-optimized.{ext} |
The winning optimized content |
data/{name}-experiments.json |
Full experiment log — all variants + all scores |
data/{name}-optimization-report.md |
Human-readable summary with winner rationale |
Expert Panel (5 Personas)
Score every variant against all 5. Batch all variants into a single API call per round.
| # | Persona | Scoring Lens |
|---|---|---|
| 1 | CMO at a mid-market B2B company (50M+ revenue) | "Would this make me stop and engage?" |
| 2 | Skeptical founder | "Do I believe this? Would I trust this company?" |
| 3 | Conversion rate optimizer | "Is this clear, specific, and action-driving?" |
| 4 | Senior copywriter | "Is this compelling, differentiated, and well-crafted?" |
| 5 | Your CEO/founder | "Direct, ROI-obsessed, no BS. Would I put this on my site?" |
Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a
references/founder-voice.mdfile.
Each judge scores 0–100. Final score = average across all 5 judges.
Round Structure (Per Content Element)
Round 1:
→ Generate 10 variants of the element
→ Batch-score all 10 with the 5-expert panel (1 API call)
→ Rank by average score
→ Keep top 3
Round 2 (Evolution):
→ Analyze what the top 3 did right
→ Generate 10 new variants that push those winning patterns further
→ Batch-score all 10 (1 API call)
→ Keep top 3
Round 3 (If score < threshold):
→ Identify weakest scoring dimension
→ Generate 10 variants optimized for that dimension
→ Batch-score → keep top 1
Multi-element cross-breeding:
→ Take top 1 winner from each element
→ Generate 5 combinations that mix winning elements
→ Score holistically as complete units
→ Output the single best combination
Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.
Content Types & Score Dimensions
Landing Pages
Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof
Score dimensions:
first_impression— Does it grab immediately?clarity— Is the offer instantly understood?trust— Does it feel credible?urgency— Is there a reason to act now?would_convert— Would the judge actually click?
Email Sequences
Elements to optimize: Subject line, opening line, body copy, CTA, PS line
Score dimensions:
would_open— Subject line pass ratewould_read— Does the opening hook?would_click— Is the CTA compelling?would_reply— Does it feel personal enough to respond to?spam_risk— Does it feel spammy? (lower = better; invert for final score)
Ad Copy
Elements to optimize: Headline, description, CTA
Score dimensions:
scroll_stopping— Does it interrupt the scroll?clarity— Is the value prop clear in 3 seconds?click_worthiness— Does the judge want to click?relevance— Does it match likely audience intent?differentiation— Does it stand out from competitors?
Form Pages
Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy
Score dimensions:
first_impression— Does it feel worth filling out?trust— Do they believe their info is safe and the offer is real?completion_likelihood— Would the judge start filling it out?lead_quality— Would this attract serious prospects (not tire-kickers)?would_fill_out— Final gut check: would they submit?
Step-by-Step Execution Protocol
Step 1: Intake & Parse
Read the source content. Identify content type automatically or confirm with user:
- HTML file → landing page or form page
- Markdown / plain text → email or ad copy
- If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"
Extract all optimizable elements. List them back to user:
Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]
Optimizing: all | Variants per round: 10 | Min score: 80
Step 2: Get API Key
Check for Anthropic API key: $ANTHROPIC_API_KEY environment variable.
export ANTHROPIC_API_KEY="your-api-key-here"
Step 3: Run Optimization Rounds
For each element, run the round structure above.
Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.
Model preference (in order):
claude-sonnet-4-5(preferred — fast + smart)claude-opus-4(if highest quality needed)- Any claude-3.5+ model if the above aren't available
Step 4: Cross-Breed (Multi-Element)
After all elements have winners:
- Assemble the top winner from each element into a complete unit
- Generate 5 holistic variants that naturally combine the winning elements
- Score the complete units (not just individual parts)
- Pick the winner with the highest holistic score
Step 5: Write Output Files
# Create output directory
mkdir -p data
# Write optimized content
# Write experiments JSON
# Write optimization report
Experiments JSON structure:
{
"run_id": "autoresearch-{name}-{timestamp}",
"content_type": "landing_page",
"source_file": "path/to/original",
"min_score_threshold": 80,
"rounds": [
{
"round": 1,
"element": "hero_headline",
"variants": [
{
"id": 1,
"text": "...",
"scores": {
"cmo": 72,
"skeptical_founder": 68,
"cro": 75,
"copywriter": 70,
"founder": 65
},
"avg_score": 70
}
],
"top_3": [1, 4, 7],
"winner_score": 82
}
],
"final_winner": {
"hero_headline": "...",
"subheadline": "...",
"cta": "...",
"holistic_score": 87
}
}
Step 6: Report Back
Summarize results to user:
- Final winning score
- Biggest score jump (which element improved most)
- Top 2 runner-up alternatives (in case winner doesn't feel right)
- Path to all 3 output files
- Clear next step
User Options
| Option | Default | Description |
|---|---|---|
elements |
all | Which elements to optimize |
variants_per_round |
10 | How many variants to generate per round |
min_score |
80 | Stop when this score is hit |
rounds |
3 | Max rounds before stopping |
auto_apply |
false | Whether to overwrite the source file with winners |
content_type |
auto-detect | Force a content type if auto-detect is wrong |
Quality Gates
- < 70: Don't ship. Something fundamental is broken.
- 70-79: Marginal. One more round targeting the lowest-scoring dimension.
- 80-84: Good. Shippable. Validate with real traffic.
- 85-89: Strong. Ship with confidence.
- 90+: Rare. Ship immediately.
Anti-Patterns to Avoid
- Never call the API once per variant. Always batch. A 10-variant round = 1 call.
- Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
- Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
- Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.
More from ericosiu/ai-marketing-skills
expert-panel
>-
72yt-competitive-analysis
>-
62x-longform-post
Write long-form X (Twitter) posts and threads in a founder/CEO voice. Use when drafting X articles, long tweets, thought leadership threads, or viral content. Produces contrarian, data-backed posts with ASCII diagrams and code block visuals. Includes mandatory AI humanizer pass (24-pattern detector) before finalizing.
57deck-generator
Generate professional presentations with AI-generated images. Use when asked to create a deck, presentation, pitch deck, or slides. Supports style presets (whiteboard, corporate, minimalist, etc). Uses Imagen 4.0 API for image generation and Google Slides API for assembly. Produces full decks from markdown content specs in minutes.
53