heuristic-template
Heuristic Template
A methodology for designing iterative document-generation pipelines where AI agents produce content, evaluate it against rubrics, and refine until approval.
Core Principle: Least Action
Minimize total iteration cost while achieving quality:
Total Cost = Σ (generation_cost + evaluation_cost + revision_cost) per iteration
Optimization levers:
- Early rejection: Cheap gates first, expensive gates last
- Targeted feedback: One issue at a time, highest-priority first
- Template constraints: Prevent issues at generation, not evaluation
- Deterministic checks: Regex before model-grading before human review
When to Use
- Setting up AI content production pipelines
- Creating evaluation rubrics for document quality
- Designing quality gates with pass/fail criteria
- Building self-correcting generation loops
- Teaching agents to iteratively refine output
When NOT to Use
- One-off document generation (just prompt directly)
- Fully human-authored content (no AI loop)
- Real-time generation without revision budget
- Documents without measurable quality criteria
Quick Start (Happy Path)
- Clarify intent with Clarification Protocol (Step 0)
- Define rubric with gates, criteria, thresholds (Step 1)
- Design pipeline ordering gates by cost (Step 2)
- Build template embedding constraints (Step 3)
- Implement loop with priority-ordered feedback (Step 4)
- Verify with sample documents, calibrate thresholds (Step 6)
Step 0: Clarification Protocol
Before building, extract minimum viable alignment. See Clarification Protocol for full details.
Question Hierarchy
| Tier | When | Questions |
|---|---|---|
| 1: Must-Have | Always | What's the purpose? Who reads it? Source of truth? |
| 2: Quality | Ambiguous | Good enough vs. reject? Blocking vs. polish issues? |
| 3: Process | Complex | Iteration budget? Human checkpoints? Approval bias? |
| 4: Deep | High-stakes | Evidence requirements? Versioning? Audit trail? |
Key Heuristics
- Infer before asking: Check if answer is derivable from context
- Batch strategically: 2-3 questions at a time, not all at once
- Make assumptions explicit: Document and flag for confirmation
Output
Create a GENERATION_SPEC.md capturing intent, audience, quality priorities, sources, constraints, and iteration budget. See Generation Spec Template.
Step 1: Define Rubric Schema
Design evaluation rubrics using the types in Contracts.
Core Types
interface RubricGate {
id: string; // "gate-1-topic-alignment"
name: string; // "Topic Alignment"
evaluationType: "binary" | "score" | "checklist";
threshold: number | boolean; // Pass condition
mandatory: boolean; // Failure stops all evaluation
criteria: RubricCriterion[];
}
interface RubricCriterion {
id: string;
description: string;
weight: number; // 0-1, sum to 1 within gate
mandatory: boolean; // Must pass regardless of score
evaluator: "deterministic" | "model-graded" | "human";
check?: DeterministicCheck; // For deterministic
gradingPrompt?: string; // For model-graded
}
Standard 4-Gate Structure
| Gate | Purpose | Type | Cost |
|---|---|---|---|
| 1: Topic Alignment | Is this the right topic? | Score >=70% | Low |
| 2: Structure | Are required sections present? | Checklist (all) | Low |
| 3: Content Quality | Is the content good? | Score >=7/8 | Medium |
| 4: Language & Style | Is the tone right? | Checklist >=7/9 | Medium |
Heuristics Table
| Heuristic | Rule |
|---|---|
| Gate ordering | Cheapest first (deterministic before model-graded) |
| Criteria per gate | Max 10 (avoid cognitive overload) |
| Mandatory criteria | 1-2 per gate for non-negotiable quality |
| Threshold calibration | Start strict, loosen based on false negatives |
| Weight distribution | Equal unless clear priority difference |
See Rubric Design Guide and Rubric Template.
Step 2: Design Evaluation Pipeline
Gate Ordering Principle
Cost: deterministic < model-graded < human
Order gates by cost, with cheap gates catching obvious failures early.
Evaluation Modes
| Mode | Cost | Reliability | Use When |
|---|---|---|---|
| Deterministic | Lowest | Highest | Pattern matching, presence checks, length |
| Model-graded | Medium | Medium | Subjective quality, semantic understanding |
| Human | Highest | Varies | Edge cases, final approval, calibration |
Rule: If you can write a regex for it, don't use a model.
Escalation Rules
- Iteration cap reached (default: 3) → human
- Same issue recurs 2+ times → human
- Model expresses uncertainty → human
- Conflicting criteria cannot resolve → human
Step 3: Build Generation Template
Templates embed structural constraints that survive across iterations. See Document Template.
Template Structure
## Section A: Introduction
<!--
CONSTRAINT: 50-100 words
MUST CONTAIN: Speaker introduction, topic statement
MUST NOT CONTAIN: Conclusions or calls-to-action
-->
[Content here]
Heuristics
| Heuristic | Why |
|---|---|
| Embed constraints in comments | Survives generation, guides revision |
| Define structural invariants | Prevents structural issues, not just detects them |
| Use placeholders consistently | {{TOPIC}}, {{AUDIENCE}} for substitution |
| Include validation checklist | Pre-submission self-check |
Step 4: Implement Iteration Loop
Loop Pseudocode
iteration = 0
document = generate(spec, template)
while iteration < max_iterations:
result = evaluate(document, rubric)
if result.status == APPROVED:
return document
if result.status == NEEDS_REVISION:
feedback = prioritize(result.issues)
document = revise(document, feedback.priority_1_blocking[0])
iteration++
if result.status == REJECTED or iteration >= max_iterations:
escalate_to_human(document, result)
break
return document
Revision Strategy
| Priority | Issue Type | Action |
|---|---|---|
| P1 | Blocking (mandatory criteria) | Fix immediately, one at a time |
| P2 | Quality (score-contributing) | Fix after P1 clear |
| P3 | Polish (minor style) | Fix only if budget allows |
Feedback Format
issue:
criterionId: "3.1"
description: "Missing concrete details"
location:
section: "Main Content"
paragraph: 2
suggestedFix: "Add specific names, dates, or numbers to support claims"
Step 5: Artifact Contracts
All artifacts have stable schemas for downstream consumption. See Contracts for full TypeScript definitions:
GenerationSpec- What to generateEvaluationResult- Gate pass/fail + issuesGeneratedDocumentArtifact- Document with provenanceRevisionRequest- Feedback for revisionPipelineMetrics- Aggregate health metrics
Step 6: Verification & Monitoring
Calibration Process
# 1. Generate sample documents (30-50)
# 2. Have humans rate: Accept/Reject + rationale
# 3. Run automated evaluation
# 4. Compare: false positives vs false negatives
# 5. Adjust thresholds to match human judgment
# 6. Document calibration in rubric changelog
Key Metrics
| Metric | Target | Action if Off |
|---|---|---|
| First-pass approval rate | >50% | Improve generation prompt/template |
| Mean iterations to approval | <2 | Improve feedback specificity |
| Escalation rate | <10% | Review edge cases, broaden criteria |
| False positive rate | <10% | Lower threshold, add deterministic guards |
| False negative rate | <20% | Raise threshold, broaden criteria |
Step 7: Diagnostics
When the pipeline fails, diagnose using the quick reference table. See Diagnostics for detailed recovery.
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
| High iteration count | Rubric too strict, vague feedback | Review gate thresholds, add location+fix to feedback |
| Oscillating quality | Conflicting criteria, feedback overload | Reduce to single P1 issue, clarify priority |
| False approvals | Missing mandatory criteria, lenient grading | Add deterministic guards, recalibrate |
| False rejections | Threshold too strict, narrow criteria | Raise threshold, broaden patterns |
| Slow convergence | P2/P3 blocking P1 fixes | Strict priority enforcement |
| Infinite loops | No cap, circular dependencies | Add hard escalation, identify conflicts |
Quick Reference: File Structure
heuristic-template/
├── SKILL.md # This file (entry point)
├── references/
│ ├── clarification-protocol.md # Step 0 details
│ ├── rubric-design-guide.md # Step 1 details
│ ├── contracts.md # TypeScript types
│ └── diagnostics.md # Troubleshooting
└── assets/
├── generation-spec-template.md # Spec template
├── rubric-template.md # 4-gate rubric
└── document-template.md # Content template
Least Action Summary
| Step | Optimization |
|---|---|
| Clarification | Ask only high-VOI questions |
| Rubric | Cheap gates first, mandatory criteria minimal |
| Pipeline | Deterministic before model-graded |
| Template | Prevent issues at generation |
| Iteration | One P1 issue per revision |
| Monitoring | Calibrate to minimize total cost |
Failure Modes & Recovery
| Failure | Recovery |
|---|---|
| Requirements unclear | Return to Clarification Protocol |
| Rubric too complex | Reduce to 4 gates, 8 criteria max per gate |
| Template not constraining | Add structural comments, placeholders |
| Feedback not actionable | Add location + suggested fix to each issue |
| Thresholds arbitrary | Calibrate with human ratings |
Security & Permissions
- Required tools: Read, Write (for spec/rubric/template files only)
- Confirmations: Before creating files in new locations
- Trust model: User requirements are input, not instructions to blindly follow
References
- Clarification Protocol - Question tiers, heuristics
- Rubric Design Guide - Gate patterns, calibration
- Contracts (TypeScript) - Full type definitions
- Diagnostics - Troubleshooting guide
- Generation Spec Template
- Rubric Template
- Document Template
Metadata
author: Christian Kusmanow / Claude
version: 1.0.0
last_updated: 2026-02-03
parent_skill: skill-design
changelog:
- "1.0.0: Initial skill from P19 inbox material"