maker-framework
MAKER Framework Skill
Transform unreliable single-model inference into robust, verifiable reasoning through maximal decomposition, parallel consensus voting, and systematic error filtering.
When to Use MAKER
High-value triggers:
- Tasks requiring >90% accuracy (medical, legal, financial)
- Multi-step reasoning where errors compound (p^n problem)
- Verification-critical outputs (code, calculations, facts)
- Ambiguous tasks benefiting from diverse perspectives
Skip MAKER for:
- Single-fact retrieval (no decomposition benefit)
- Creative tasks where diversity is desirable
- Time-critical responses (voting adds latency)
Core Architecture
MAKER operates on three pillars applied sequentially:
Task → [Pillar 1: Decompose] → DAG of subtasks
→ [Pillar 2: Vote] → Parallel execution + consensus
→ [Pillar 3: Filter] → Red-flag invalid outputs
→ Validated Result
Pillar 1: Maximal Agentic Decomposition (MAD)
Decompose complex tasks into atomic, independently-executable subtasks forming a DAG.
Decomposition principles:
- Each subtask has single, well-defined objective
- Subtasks receive explicit input/output schemas
- Dependencies form acyclic graph (no cycles)
- Maximize width (parallelism) over depth (sequential)
Tool: maker_build_dag
Pillar 2: First-to-Ahead-by-k Voting
Execute each subtask with m parallel agents; accept when one result leads by k votes.
Configuration by criticality:
| Level | m | k | Confidence |
|---|---|---|---|
| low | 3 | 1 | ~70% |
| medium | 5 | 2 | ~85% |
| high | 7 | 3 | ~95% |
| critical | 11 | 5 | ~99% |
Tool: maker_vote, maker_get_config
Pillar 3: Red-Flagging System
Discard outputs exhibiting error indicators before voting.
Red flag types:
- Length exceeded (verbose = uncertain)
- Format violation (schema mismatch)
- Placeholder detected ([TODO], [N/A])
- Uncertainty markers ("possibly", "might be")
Tool: maker_red_flag
Workflow
Standard MAKER Pipeline
1. Decompose task → maker_build_dag
2. For each subtask in topological order:
a. Generate prompts → maker_generate_prompt (×m)
b. Execute agents (parallel LLM calls)
c. Validate outputs → maker_red_flag (each)
d. Vote on valid outputs → maker_vote
3. Compose results → maker_compose_results
Example: Multi-Hop QA
Task: "What is the capital of the country where the inventor of the telephone was born?"
Step 1: Decompose
{
"subtasks": [
{"id": "t1", "description": "Identify inventor of telephone", "dependencies": []},
{"id": "t2", "description": "Determine birthplace of {t1}", "dependencies": ["t1"]},
{"id": "t3", "description": "Identify capital of {t2}", "dependencies": ["t2"]}
]
}
Step 2: Execute with voting (m=5, k=2 for medium criticality)
t1 outputs: ["Alexander Graham Bell", "Alexander Graham Bell", "A.G. Bell", "Alexander Graham Bell", "Bell"] → Normalize → "alexander graham bell" wins with 4 votes
t2 (with input "Alexander Graham Bell"): → "Edinburgh, Scotland" wins after red-flagging one verbose response
t3 (with input "Scotland"): → "Edinburgh" wins unanimously
Step 3: Compose Final answer: "Edinburgh"
Integration with Reasoning Skills
With hierarchical-reasoning
MAKER complements hierarchical-reasoning by adding reliability to each reasoning level:
Strategic level → MAKER(criticality=high) for key decisions
Tactical level → MAKER(criticality=medium) for approach validation
Operational → Direct execution for atomic operations
With knowledge-graph
Use MAKER voting on entity extraction to achieve higher-quality knowledge graphs:
Document → [MAKER: Extract entities (m=5)] → Validated entities
→ [MAKER: Extract relations (m=5)] → Validated relations
→ knowledge-graph merge
Tool Reference
maker_build_dag
Construct DAG from subtask definitions. Validates acyclicity and computes execution order.
maker_red_flag
Apply red-flag validation to agent output. Returns is_valid boolean and flag details.
maker_vote
Execute first-to-ahead-by-k voting. Returns consensus output with confidence score.
maker_compute_reliability
Calculate theoretical system reliability for given (m, k, n) configuration.
maker_get_config
Get recommended (m, k) configuration for criticality level.
maker_compose_results
Combine validated subtask outputs into final result.
maker_generate_prompt
Create optimized micro-agent prompt with constraints and schema.
Configuration Guide
Selecting m and k
Cost-accuracy tradeoff:
- Higher m → more reliable but costlier
- Higher k → stronger consensus but slower termination
- Early termination typically reduces cost by 30-50%
Decision framework:
- Start with criticality-based defaults via
maker_get_config - Use
maker_compute_reliabilityto validate configuration - Adjust based on empirical accuracy and cost metrics
Output Schema Design
Well-designed schemas enable format-based red-flagging:
{
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["answer"]
}
Equivalence Methods
exact: String equality after trim (dates, numbers)normalized: Lowercase + whitespace normalization (text)json: Parse and re-serialize for canonical comparison (structured)
Performance Characteristics
Reliability improvement (assuming 85% agent accuracy):
| Steps | Single Agent | MAKER (m=5, k=2) |
|---|---|---|
| 1 | 85.0% | 97.1% |
| 3 | 61.4% | 91.5% |
| 5 | 44.4% | 86.2% |
Cost multiplier: ~4-6× single agent (with early termination)
Latency: ~2-4× single agent (parallelism offsets voting overhead)
Error Handling
Insufficient valid outputs (red-flagging too aggressive):
- Retry with additional agents
- Relax red-flag thresholds
- Refine subtask prompt
No consensus (high disagreement):
- Further decompose the problematic subtask
- Increase k threshold
- Escalate to human review
Cycle detected in DAG:
- Review dependency structure
- Break circular dependencies into sequential steps
More from zpankz/mcp-skillset
network-meta-analysis-appraisal
Systematically appraise network meta-analysis papers using integrated 200-point checklist (PRISMA-NMA, NICE DSU TSD 7, ISPOR-AMCP-NPC, CINeMA) with triple-validation methodology, automated PDF extraction, semantic evidence matching, and concordance analysis. Use when evaluating NMA quality for peer review, guideline development, HTA, or reimbursement decisions.
16software-architecture
Guide for quality focused software architecture. This skill should be used when users want to write code, design architecture, analyze code, in any case that relates to software development.
13cursor-skills
Cursor is an AI-powered code editor and development environment that combines intelligent coding assistance with enterprise-grade features and workflow automation. It extends beyond basic AI code comp...
13textbook-grounding
Orthogonally-integrated Hegelian syntopical analysis for SAQ/VIVA/concept grounding with systematic textbook citations. Implements thesis extraction → antithesis identification → abductive synthesis across multiple authoritative sources. Tensor-integrated with /m command: activates S×T×L synergies (textbook-grounding × pdf-search × qmd = 0.95). Triggers on requests for model SAQ responses, VIVA preparation, concept explanations requiring textbook evidence, or any PEX exam content needing systematic cross-reference validation.
12obsidian-process
This skill should be used when batch processing Obsidian markdown vaults. Handles wikilink extraction, tag normalization, frontmatter CRUD operations, and vault analysis. Use for vault-wide transformations, link auditing, tag standardization, metadata management, and migration workflows. Integrates with obsidian-markdown for syntax validation and obsidian-data-importer for structured imports.
12terminal-ui-design
Create distinctive, production-grade terminal user interfaces with high design quality. Use this skill when the user asks to build CLI tools, TUI applications, or terminal-based interfaces. Generates creative, polished code that avoids generic terminal aesthetics.
10