llm-council
LLM Council Skill
LIBRARY-FIRST PROTOCOL (MANDATORY)
Before writing ANY code, you MUST check:
Step 1: Library Catalog
- Location:
.claude/library/catalog.json - If match >70%: REUSE or ADAPT
Step 2: Patterns Guide
- Location:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md - If pattern exists: FOLLOW documented approach
Step 3: Existing Projects
- Location:
D:\Projects\* - If found: EXTRACT and adapt
Decision Matrix
| Match | Action |
|---|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
Purpose
Run 3-stage multi-model consensus for critical decisions where:
- Single-model hallucination risk is unacceptable
- Multiple perspectives improve decision quality
- High-stakes choices need validation
Architecture (Karpathy Pattern)
STAGE 1: COLLECT
+---> Claude ---> Response A
|
Query --+---> Gemini ---> Response B
|
+---> Codex ----> Response C
STAGE 2: RANK
Each model reviews others (anonymized)
Produces rankings with rationale
STAGE 3: SYNTHESIZE
Chairman aggregates rankings
Produces final answer with consensus score
When to Use
Perfect For:
- Architecture decisions
- Technology selection
- Critical bug triage
- Security assessment
- High-risk deployments
- Contentious design choices
Don't Use When:
- Simple, low-risk decisions
- Time-critical responses
- Single correct answer exists
- Cost is a concern (3x API usage)
Usage
Basic Council
/llm-council "Should we use microservices or monolith for this system?"
With Threshold
/llm-council "Which auth approach is best?" --threshold 0.75
With Chairman Override
/llm-council "Architecture decision" --chairman gemini
Command Pattern
bash scripts/multi-model/llm-council.sh "<query>" "<threshold>" "<chairman>"
Configuration
| Parameter | Default | Description |
|---|---|---|
| threshold | 0.67 | Minimum consensus score |
| chairman | claude | Model that synthesizes final answer |
| models | [claude, gemini, codex] | Participating models |
Consensus Scoring
- >0.80: Strong consensus - proceed with confidence
- 0.67-0.80: Moderate consensus - consider minority views
- <0.67: Weak consensus - escalate to human review
Memory Integration
Results stored to Memory-MCP:
- Key:
multi-model/council/decisions/{query_id} - Tags: WHO=llm-council, WHY=consensus-decision
Output Format
{
"query": "Original question",
"final_answer": {
"synthesis": "Combined answer...",
"chairman": "claude"
},
"consensus_score": 0.85,
"responses": {
"claude": "...",
"gemini": "...",
"codex": "..."
},
"rankings": [
{"model": "A", "rank": 1, "rationale": "..."}
]
}
Failure Modes
Deadlock (No Consensus)
- All models disagree
- Consensus < threshold
- Action: Store for human review
Model Unavailable
- One model times out
- Action: Continue with 2 models (2/3 quorum)
Chairman Failure
- Synthesis fails
- Action: Fallback to highest-ranked response
Integration Examples
Architecture Decision
const decision = await runCouncil(
"Microservices vs Monolith for our scale?",
{ threshold: 0.75 }
);
if (decision.consensus_score >= 0.75) {
proceed(decision.final_answer);
} else {
escalateToHuman(decision);
}
Security Assessment
const assessment = await runCouncil(
"Is this authentication approach secure?",
{ threshold: 0.80 }
);
// Higher threshold for security decisions
Sources
More from dnyoussef/context-cascade
reverse-engineering-deep-analysis
Advanced binary analysis with runtime execution and symbolic path exploration (RE Levels 3-4). Use when need runtime behavior, memory dumps, secret extraction, or input synthesis to reach specific program states. Completes in 3-7 hours with GDB+Angr.
52reverse-engineering-firmware-analysis
Firmware extraction and IoT security analysis (RE Level 5) for routers and embedded systems. Use when analyzing IoT firmware, extracting embedded filesystems (SquashFS/JFFS2/CramFS), finding hardcoded credentials, performing CVE scans, or auditing embedded system security. Handles encrypted firmware with known decryption schemes. Completes in 2-8 hours with binwalk+firmadyne+QEMU emulation.
23reasoningbank-adaptive-learning-with-agentdb
---
14reverse-engineering-quick-triage
Fast binary analysis with string reconnaissance and static disassembly\ \ (RE Levels 1-2). Use when triaging suspicious binaries, extracting IOCs quickly,\ \ or performing initial malware analysis. Completes in \u22642 hours with automated\ \ decision gates.
13web-scraping
Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.
10build-feature
Build feature command
7