llm-council

Installation

SKILL.md

LLM Council Skill

LIBRARY-FIRST PROTOCOL (MANDATORY)

Before writing ANY code, you MUST check:

Step 1: Library Catalog

Location: .claude/library/catalog.json
If match >70%: REUSE or ADAPT

Step 2: Patterns Guide

Location: .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
If pattern exists: FOLLOW documented approach

Step 3: Existing Projects

Location: D:\Projects\*
If found: EXTRACT and adapt

Decision Matrix

Match	Action
Library >90%	REUSE directly
Library 70-90%	ADAPT minimally
Pattern exists	FOLLOW pattern
In project	EXTRACT
No match	BUILD (add to library after)

Purpose

Run 3-stage multi-model consensus for critical decisions where:

Single-model hallucination risk is unacceptable
Multiple perspectives improve decision quality
High-stakes choices need validation

Architecture (Karpathy Pattern)

STAGE 1: COLLECT
  +---> Claude ---> Response A
  |
Query --+---> Gemini ---> Response B
  |
  +---> Codex ----> Response C

STAGE 2: RANK
  Each model reviews others (anonymized)
  Produces rankings with rationale

STAGE 3: SYNTHESIZE
  Chairman aggregates rankings
  Produces final answer with consensus score

When to Use

Perfect For:

Architecture decisions
Technology selection
Critical bug triage
Security assessment
High-risk deployments
Contentious design choices

Don't Use When:

Simple, low-risk decisions
Time-critical responses
Single correct answer exists
Cost is a concern (3x API usage)

Usage

Basic Council

/llm-council "Should we use microservices or monolith for this system?"

With Threshold

/llm-council "Which auth approach is best?" --threshold 0.75

With Chairman Override

/llm-council "Architecture decision" --chairman gemini

Command Pattern

bash scripts/multi-model/llm-council.sh "<query>" "<threshold>" "<chairman>"

Configuration

Parameter	Default	Description
threshold	0.67	Minimum consensus score
chairman	claude	Model that synthesizes final answer
models	[claude, gemini, codex]	Participating models

Consensus Scoring

>0.80: Strong consensus - proceed with confidence
0.67-0.80: Moderate consensus - consider minority views
<0.67: Weak consensus - escalate to human review

Memory Integration

Results stored to Memory-MCP:

Key: multi-model/council/decisions/{query_id}
Tags: WHO=llm-council, WHY=consensus-decision

Output Format

{
  "query": "Original question",
  "final_answer": {
    "synthesis": "Combined answer...",
    "chairman": "claude"
  },
  "consensus_score": 0.85,
  "responses": {
    "claude": "...",
    "gemini": "...",
    "codex": "..."
  },
  "rankings": [
    {"model": "A", "rank": 1, "rationale": "..."}
  ]
}

Failure Modes

Deadlock (No Consensus)

All models disagree
Consensus < threshold
Action: Store for human review

Model Unavailable

One model times out
Action: Continue with 2 models (2/3 quorum)

Chairman Failure

Synthesis fails
Action: Fallback to highest-ranked response

Integration Examples

Architecture Decision

const decision = await runCouncil(
  "Microservices vs Monolith for our scale?",
  { threshold: 0.75 }
);

if (decision.consensus_score >= 0.75) {
  proceed(decision.final_answer);
} else {
  escalateToHuman(decision);
}

Security Assessment

const assessment = await runCouncil(
  "Is this authentication approach secure?",
  { threshold: 0.80 }
);
// Higher threshold for security decisions

Sources

Related skills

More from dnyoussef/context-cascade

Installs

Repository

dnyoussef/conte…-cascade

GitHub Stars

First Seen

Feb 8, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykPass