maker-framework
MAKER Framework Skill
Transform unreliable single-model inference into robust, verifiable reasoning through maximal decomposition, parallel consensus voting, and systematic error filtering.
When to Use MAKER
High-value triggers:
- Tasks requiring >90% accuracy (medical, legal, financial)
- Multi-step reasoning where errors compound (p^n problem)
- Verification-critical outputs (code, calculations, facts)
- Ambiguous tasks benefiting from diverse perspectives
Skip MAKER for:
- Single-fact retrieval (no decomposition benefit)
- Creative tasks where diversity is desirable
- Time-critical responses (voting adds latency)
Core Architecture
MAKER operates on three pillars applied sequentially:
Task → [Pillar 1: Decompose] → DAG of subtasks
→ [Pillar 2: Vote] → Parallel execution + consensus
→ [Pillar 3: Filter] → Red-flag invalid outputs
→ Validated Result
Pillar 1: Maximal Agentic Decomposition (MAD)
Decompose complex tasks into atomic, independently-executable subtasks forming a DAG.
Decomposition principles:
- Each subtask has single, well-defined objective
- Subtasks receive explicit input/output schemas
- Dependencies form acyclic graph (no cycles)
- Maximize width (parallelism) over depth (sequential)
Tool: maker_build_dag
Pillar 2: First-to-Ahead-by-k Voting
Execute each subtask with m parallel agents; accept when one result leads by k votes.
Configuration by criticality:
| Level | m | k | Confidence |
|---|---|---|---|
| low | 3 | 1 | ~70% |
| medium | 5 | 2 | ~85% |
| high | 7 | 3 | ~95% |
| critical | 11 | 5 | ~99% |
Tool: maker_vote, maker_get_config
Pillar 3: Red-Flagging System
Discard outputs exhibiting error indicators before voting.
Red flag types:
- Length exceeded (verbose = uncertain)
- Format violation (schema mismatch)
- Placeholder detected ([TODO], [N/A])
- Uncertainty markers ("possibly", "might be")
Tool: maker_red_flag
Workflow
Standard MAKER Pipeline
1. Decompose task → maker_build_dag
2. For each subtask in topological order:
a. Generate prompts → maker_generate_prompt (×m)
b. Execute agents (parallel LLM calls)
c. Validate outputs → maker_red_flag (each)
d. Vote on valid outputs → maker_vote
3. Compose results → maker_compose_results
Example: Multi-Hop QA
Task: "What is the capital of the country where the inventor of the telephone was born?"
Step 1: Decompose
{
"subtasks": [
{"id": "t1", "description": "Identify inventor of telephone", "dependencies": []},
{"id": "t2", "description": "Determine birthplace of {t1}", "dependencies": ["t1"]},
{"id": "t3", "description": "Identify capital of {t2}", "dependencies": ["t2"]}
]
}
Step 2: Execute with voting (m=5, k=2 for medium criticality)
t1 outputs: ["Alexander Graham Bell", "Alexander Graham Bell", "A.G. Bell", "Alexander Graham Bell", "Bell"] → Normalize → "alexander graham bell" wins with 4 votes
t2 (with input "Alexander Graham Bell"): → "Edinburgh, Scotland" wins after red-flagging one verbose response
t3 (with input "Scotland"): → "Edinburgh" wins unanimously
Step 3: Compose Final answer: "Edinburgh"
Integration with Reasoning Skills
With hierarchical-reasoning
MAKER complements hierarchical-reasoning by adding reliability to each reasoning level:
Strategic level → MAKER(criticality=high) for key decisions
Tactical level → MAKER(criticality=medium) for approach validation
Operational → Direct execution for atomic operations
With knowledge-graph
Use MAKER voting on entity extraction to achieve higher-quality knowledge graphs:
Document → [MAKER: Extract entities (m=5)] → Validated entities
→ [MAKER: Extract relations (m=5)] → Validated relations
→ knowledge-graph merge
Tool Reference
maker_build_dag
Construct DAG from subtask definitions. Validates acyclicity and computes execution order.
maker_red_flag
Apply red-flag validation to agent output. Returns is_valid boolean and flag details.
maker_vote
Execute first-to-ahead-by-k voting. Returns consensus output with confidence score.
maker_compute_reliability
Calculate theoretical system reliability for given (m, k, n) configuration.
maker_get_config
Get recommended (m, k) configuration for criticality level.
maker_compose_results
Combine validated subtask outputs into final result.
maker_generate_prompt
Create optimized micro-agent prompt with constraints and schema.
Configuration Guide
Selecting m and k
Cost-accuracy tradeoff:
- Higher m → more reliable but costlier
- Higher k → stronger consensus but slower termination
- Early termination typically reduces cost by 30-50%
Decision framework:
- Start with criticality-based defaults via
maker_get_config - Use
maker_compute_reliabilityto validate configuration - Adjust based on empirical accuracy and cost metrics
Output Schema Design
Well-designed schemas enable format-based red-flagging:
{
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["answer"]
}
Equivalence Methods
exact: String equality after trim (dates, numbers)normalized: Lowercase + whitespace normalization (text)json: Parse and re-serialize for canonical comparison (structured)
Performance Characteristics
Reliability improvement (assuming 85% agent accuracy):
| Steps | Single Agent | MAKER (m=5, k=2) |
|---|---|---|
| 1 | 85.0% | 97.1% |
| 3 | 61.4% | 91.5% |
| 5 | 44.4% | 86.2% |
Cost multiplier: ~4-6× single agent (with early termination)
Latency: ~2-4× single agent (parallelism offsets voting overhead)
Error Handling
Insufficient valid outputs (red-flagging too aggressive):
- Retry with additional agents
- Relax red-flag thresholds
- Refine subtask prompt
No consensus (high disagreement):
- Further decompose the problematic subtask
- Increase k threshold
- Escalate to human review
Cycle detected in DAG:
- Review dependency structure
- Break circular dependencies into sequential steps