LLM Council
LLM Council (Fireworks AI)
This skill implements Karpathy's LLM Council concept where multiple open-weight LLMs deliberate on a query, powered entirely by Fireworks AI:
- Phase 1: All models respond to the query independently (parallel)
- Phase 2: Models rank each other's anonymized responses
- Phase 3: A Chairman LLM synthesizes the final answer
All inference runs through Fireworks AI using open-weight models. The speed and pricing of Fireworks makes it practical to run multi-model deliberation that would be slow or expensive on other providers.
CRITICAL RULES
- ALWAYS use AskUserQuestion to let the user select council models (multiselect) and the Chairman model
- ALWAYS save raw responses to files - never summarize or truncate API outputs
- ALWAYS show full transparency - display all individual responses, all rankings, AND the final synthesis
- NEVER skip the ranking phase - it is essential to the council deliberation process
- Read from files for display - ensures content is shown unmodified
- ALWAYS display the final output to the user after Phase 3 completes
Pre-flight Check
Before running any phase, verify the Fireworks API key is set:
if [ -z "$FIREWORKS_API_KEY" ]; then
echo "ERROR: FIREWORKS_API_KEY is not set."
echo "Create a Fireworks AI account at: https://fireworks.ai/"
echo "Then export it in your shell profile (~/.zshrc or ~/.bashrc):"
echo ' export FIREWORKS_API_KEY="your_api_key_here"'
exit 1
fi
echo "FIREWORKS_API_KEY is set."
Available Models
Present these options to the user via AskUserQuestion (multiselect):
| Model | Fireworks ID | Provider |
|---|---|---|
| GLM 5 | accounts/fireworks/models/glm-5 | Z.ai |
| DeepSeek V3.1 | accounts/fireworks/models/deepseek-v3p1 | DeepSeek |
| DeepSeek V3.2 | accounts/fireworks/models/deepseek-v3p2 | DeepSeek |
| MiniMax M2.1 | accounts/fireworks/models/minimax-m2p1 | MiniMax |
| Kimi K2.5 | accounts/fireworks/models/kimi-k2p5 | Moonshot |
| Qwen3 235B | accounts/fireworks/models/qwen3-235b-a22b | Alibaba |
| Llama 4 Maverick | accounts/fireworks/models/llama4-maverick-instruct-basic | Meta |
Workflow
Step 1: Gather User Input
Use AskUserQuestion to get:
- The query/question for the council (or accept it from the conversation)
- Which models to include (multiselect, recommend 3-5 models)
- Which model should be the Chairman (single select)
Note: AskUserQuestion supports max 4 options per question. Since there are 7 models, split model selection across two questions, or show the most popular 4 and let the user type "Other" for the rest. A good default is to show 4 models in the first question and note the others are available via "Other". Rotate which models are shown based on variety.
Example AskUserQuestion for model selection (show 4, mention others):
question: "Which models should participate in the LLM Council? (Also available via Other: Llama 4 Maverick, Qwen3 235B, GLM 5)"
header: "Models"
multiSelect: true
options:
- label: "DeepSeek V3.2"
description: "DeepSeek's newest and most capable model"
- label: "MiniMax M2.1"
description: "MiniMax's strong open-weight model"
- label: "Kimi K2.5"
description: "Moonshot's strong open-weight model"
- label: "DeepSeek V3.1"
description: "DeepSeek's proven reasoning model"
Example AskUserQuestion for chairman:
question: "Which model should be the Chairman (synthesizes the final answer)?"
header: "Chairman"
multiSelect: false
options:
- label: "DeepSeek V3.2 (Recommended)"
description: "Newest DeepSeek, strong at comprehensive analysis"
- label: "GLM 5"
description: "Strong reasoning for synthesis"
- label: "Kimi K2.5"
description: "Strong at structured synthesis"
- label: "MiniMax M2.1"
description: "Strong open-weight model for synthesis"
Model Name to ID Mapping
Use this mapping to convert user selections to Fireworks model IDs:
MODEL_MAP = {
"GLM 5": "accounts/fireworks/models/glm-5",
"DeepSeek V3.1": "accounts/fireworks/models/deepseek-v3p1",
"DeepSeek V3.2": "accounts/fireworks/models/deepseek-v3p2",
"MiniMax M2.1": "accounts/fireworks/models/minimax-m2p1",
"Kimi K2.5": "accounts/fireworks/models/kimi-k2p5",
"Qwen3 235B": "accounts/fireworks/models/qwen3-235b-a22b",
"Llama 4 Maverick": "accounts/fireworks/models/llama4-maverick-instruct-basic",
}
Step 2: Run Phase 1 - Individual Responses
After gathering input, run this script to get responses from all selected models in parallel:
QUERY="USER_QUERY_HERE"
MODELS='["accounts/fireworks/models/glm-5", "accounts/fireworks/models/deepseek-v3p1"]'
python3 << 'PYEOF'
import os
import json
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"
QUERY = os.environ.get("QUERY", "")
MODELS = json.loads(os.environ.get("MODELS", "[]"))
# Create session directory
timestamp = time.strftime("%Y%m%d-%H%M%S")
SESSION_DIR = f"/tmp/llm-council/{timestamp}"
os.makedirs(SESSION_DIR, exist_ok=True)
# Save config
config = {"query": QUERY, "models": MODELS, "timestamp": timestamp}
with open(f"{SESSION_DIR}/config.json", "w") as f:
json.dump(config, f, indent=2)
def call_model(model_id, query):
"""Call a single model via Fireworks AI"""
try:
start = time.time()
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {FIREWORKS_API_KEY}",
"Content-Type": "application/json"
},
json={
"model": model_id,
"messages": [
{"role": "system", "content": "You are participating in an LLM council deliberation. Provide your best, most thoughtful response to the query. Be comprehensive but focused."},
{"role": "user", "content": query}
],
"max_tokens": 4000,
"temperature": 1
},
timeout=120
)
response.raise_for_status()
elapsed = time.time() - start
data = response.json()
usage = data.get("usage", {})
return {
"success": True,
"content": data["choices"][0]["message"]["content"],
"model": model_id,
"latency_seconds": round(elapsed, 2),
"tokens": {
"prompt": usage.get("prompt_tokens", 0),
"completion": usage.get("completion_tokens", 0),
"total": usage.get("total_tokens", 0)
}
}
except Exception as e:
return {
"success": False,
"content": f"[ERROR: {str(e)}]",
"model": model_id,
"latency_seconds": 0,
"tokens": {"prompt": 0, "completion": 0, "total": 0}
}
print(f"\n{'='*60}")
print("PHASE 1: Collecting Individual Responses")
print(f"{'='*60}")
print(f"Query: {QUERY[:200]}...")
print(f"Models: {', '.join([m.split('/')[-1] for m in MODELS])}")
print(f"Session: {SESSION_DIR}")
print()
# Parallel execution
results = {}
with ThreadPoolExecutor(max_workers=len(MODELS)) as executor:
futures = {executor.submit(call_model, m, QUERY): m for m in MODELS}
for future in as_completed(futures):
model = futures[future]
result = future.result()
results[model] = result
status = "OK" if result["success"] else "FAILED"
latency = f"{result['latency_seconds']}s" if result["success"] else "N/A"
print(f" [{status}] {model.split('/')[-1]} ({latency})")
# Save raw results
with open(f"{SESSION_DIR}/phase1_responses.json", "w") as f:
json.dump(results, f, indent=2)
print(f"\nPhase 1 complete. Results saved to: {SESSION_DIR}/phase1_responses.json")
print(f"SESSION_DIR={SESSION_DIR}")
PYEOF
Step 3: Run Phase 2 - Cross-Model Ranking
Each model reviews and ranks the anonymized responses from Phase 1:
SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"
python3 << 'PYEOF'
import os
import json
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"
SESSION_DIR = os.environ.get("SESSION_DIR")
# Load Phase 1 results
with open(f"{SESSION_DIR}/config.json") as f:
config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
phase1_results = json.load(f)
QUERY = config["query"]
MODELS = config["models"]
# Create anonymized mapping
labels = ["A", "B", "C", "D", "E", "F", "G"][:len(MODELS)]
model_to_label = dict(zip(MODELS, labels))
label_to_model = {v: k for k, v in model_to_label.items()}
# Format anonymized responses
anonymized_responses = []
for model_id in MODELS:
label = model_to_label[model_id]
content = phase1_results[model_id]["content"]
anonymized_responses.append(f"=== Response {label} ===\n{content}")
anonymized_text = "\n\n".join(anonymized_responses)
def get_rankings(model_id, query, anonymized, own_label):
"""Get rankings from a single model"""
ranking_prompt = f"""You are evaluating responses from multiple AI models to this query:
QUERY: {query}
Here are the anonymized responses:
{anonymized}
Please rank these responses from BEST to WORST. For each ranking:
1. State the response letter (A, B, C, etc.)
2. Give a brief reason (1-2 sentences)
3. You may skip ranking your own response (labeled {own_label}) or rank it fairly
Format your response EXACTLY as:
RANKINGS:
1. [Letter] - [Brief reason]
2. [Letter] - [Brief reason]
3. [Letter] - [Brief reason]
..."""
try:
start = time.time()
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {FIREWORKS_API_KEY}",
"Content-Type": "application/json"
},
json={
"model": model_id,
"messages": [
{"role": "system", "content": f"You are ranking AI responses objectively. Your own response is labeled '{own_label}'."},
{"role": "user", "content": ranking_prompt}
],
"max_tokens": 1000,
"temperature": 1
},
timeout=90
)
response.raise_for_status()
elapsed = time.time() - start
return {
"success": True,
"content": response.json()["choices"][0]["message"]["content"],
"model": model_id,
"latency_seconds": round(elapsed, 2)
}
except Exception as e:
return {
"success": False,
"content": f"[ERROR: {str(e)}]",
"model": model_id,
"latency_seconds": 0
}
print(f"\n{'='*60}")
print("PHASE 2: Cross-Model Ranking")
print(f"{'='*60}")
print(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()})}")
print()
# Collect rankings from all models in parallel
rankings = {}
with ThreadPoolExecutor(max_workers=len(MODELS)) as executor:
futures = {
executor.submit(get_rankings, mid, QUERY, anonymized_text, model_to_label[mid]): mid
for mid in MODELS
}
for future in as_completed(futures):
model = futures[future]
result = future.result()
rankings[model] = result
status = "OK" if result["success"] else "FAILED"
latency = f"{result['latency_seconds']}s" if result["success"] else "N/A"
print(f" [{status}] {model.split('/')[-1]} ({latency})")
# Save rankings
output = {
"label_mapping": label_to_model,
"model_to_label": model_to_label,
"rankings": rankings
}
with open(f"{SESSION_DIR}/phase2_rankings.json", "w") as f:
json.dump(output, f, indent=2)
print(f"\nPhase 2 complete. Rankings saved to: {SESSION_DIR}/phase2_rankings.json")
PYEOF
Step 4: Run Phase 3 - Chairman Synthesis
The Chairman model receives all responses and rankings, then produces the final synthesis:
SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"
CHAIRMAN_MODEL="accounts/fireworks/models/glm-5"
python3 << 'PYEOF'
import os
import json
import requests
import time
FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"
SESSION_DIR = os.environ.get("SESSION_DIR")
CHAIRMAN_MODEL = os.environ.get("CHAIRMAN_MODEL")
# Load all previous results
with open(f"{SESSION_DIR}/config.json") as f:
config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
phase1 = json.load(f)
with open(f"{SESSION_DIR}/phase2_rankings.json") as f:
phase2 = json.load(f)
QUERY = config["query"]
label_to_model = phase2["label_mapping"]
model_to_label = phase2["model_to_label"]
# Format responses with model names revealed
responses_text = []
for model_id, result in phase1.items():
label = model_to_label.get(model_id, "?")
model_name = model_id.split("/")[-1]
responses_text.append(f"=== {label}: {model_name} ===\n{result['content']}")
# Format rankings
rankings_text = []
for model_id, result in phase2["rankings"].items():
model_name = model_id.split("/")[-1]
rankings_text.append(f"[{model_name}'s Rankings]\n{result['content']}")
synthesis_prompt = f"""You are the Chairman of an LLM Council. Your task is to synthesize the best possible answer from multiple AI responses.
ORIGINAL QUERY:
{QUERY}
INDIVIDUAL RESPONSES:
{chr(10).join(responses_text)}
MODEL RANKINGS:
{chr(10).join(rankings_text)}
As Chairman, produce a FINAL SYNTHESIS that:
1. Incorporates the strongest elements from the best-ranked responses
2. Resolves any contradictions between responses
3. Addresses aspects that multiple models agreed on
4. Corrects any errors identified through cross-ranking
5. Provides the most complete, accurate, and helpful answer
Begin your synthesis:"""
print(f"\n{'='*60}")
print("PHASE 3: Chairman Synthesis")
print(f"{'='*60}")
print(f"Chairman: {CHAIRMAN_MODEL.split('/')[-1]}")
print()
try:
start = time.time()
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {FIREWORKS_API_KEY}",
"Content-Type": "application/json"
},
json={
"model": CHAIRMAN_MODEL,
"messages": [
{"role": "system", "content": "You are the Chairman of an LLM Council. Synthesize multiple AI perspectives into a definitive, comprehensive response."},
{"role": "user", "content": synthesis_prompt}
],
"max_tokens": 4000,
"temperature": 1
},
timeout=180
)
response.raise_for_status()
elapsed = time.time() - start
synthesis = response.json()["choices"][0]["message"]["content"]
with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f:
f.write(synthesis)
print(f"Phase 3 complete ({elapsed:.2f}s). Synthesis saved to: {SESSION_DIR}/phase3_synthesis.txt")
except Exception as e:
print(f"ERROR: {e}")
synthesis = f"[ERROR: {str(e)}]"
with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f:
f.write(synthesis)
# Update config with chairman
config["chairman"] = CHAIRMAN_MODEL
with open(f"{SESSION_DIR}/config.json", "w") as f:
json.dump(config, f, indent=2)
PYEOF
Step 5: Display Full Results
Read all saved files and display the complete council deliberation:
SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"
python3 << 'PYEOF'
import os
import json
SESSION_DIR = os.environ.get("SESSION_DIR")
# Load all data
with open(f"{SESSION_DIR}/config.json") as f:
config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
phase1 = json.load(f)
with open(f"{SESSION_DIR}/phase2_rankings.json") as f:
phase2 = json.load(f)
with open(f"{SESSION_DIR}/phase3_synthesis.txt") as f:
synthesis = f.read()
model_to_label = phase2["model_to_label"]
label_to_model = phase2["label_mapping"]
# Build formatted output
output = []
output.append("=" * 70)
output.append(" LLM COUNCIL DELIBERATION")
output.append(" Powered by Fireworks AI")
output.append("=" * 70)
output.append("")
output.append(f"QUERY: {config['query']}")
output.append(f"COUNCIL: {', '.join([m.split('/')[-1] for m in config['models']])}")
output.append(f"CHAIRMAN: {config.get('chairman', 'N/A').split('/')[-1]}")
output.append("")
# Phase 1: Individual Responses
output.append("-" * 70)
output.append(" PHASE 1: INDIVIDUAL RESPONSES")
output.append("-" * 70)
output.append("")
for model_id, result in phase1.items():
model_name = model_id.split("/")[-1]
label = model_to_label.get(model_id, "?")
latency = result.get("latency_seconds", "N/A")
tokens = result.get("tokens", {})
output.append(f"[{label}] {model_name} (latency: {latency}s, tokens: {tokens.get('total', 'N/A')})")
output.append("-" * 40)
output.append(result["content"])
output.append("")
# Phase 2: Cross-Model Rankings
output.append("-" * 70)
output.append(" PHASE 2: CROSS-MODEL RANKINGS")
output.append("-" * 70)
output.append("")
output.append(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()}, indent=2)}")
output.append("")
for model_id, result in phase2["rankings"].items():
model_name = model_id.split("/")[-1]
output.append(f"[{model_name}'s Rankings]")
output.append(result["content"])
output.append("")
# Phase 3: Chairman Synthesis
output.append("-" * 70)
output.append(" PHASE 3: CHAIRMAN'S SYNTHESIS")
output.append("-" * 70)
output.append("")
chairman_name = config.get("chairman", "Chairman").split("/")[-1]
output.append(f"[{chairman_name} - Chairman]")
output.append("")
output.append(synthesis)
output.append("")
output.append("=" * 70)
output.append(f"Session files: {SESSION_DIR}/")
# Save formatted output
final_output = "\n".join(output)
with open(f"{SESSION_DIR}/final_output.md", "w") as f:
f.write(final_output)
print(final_output)
print(f"\nFull output saved to: {SESSION_DIR}/final_output.md")
PYEOF
Important Notes
- Session Directory: Each run creates a unique session in
/tmp/llm-council/{timestamp}/ - Raw Data Preserved: All API responses are saved as-is to JSON files for full transparency
- Cost: Fireworks pricing is per-token. More models and longer queries cost more. Check current pricing at https://fireworks.ai/pricing
- Latency Tracking: Each API call tracks latency so you can see Fireworks' speed in action
- Token Usage: Phase 1 responses include token counts for cost awareness
- Rate Limits: If you hit rate limits, wait briefly and retry
- Model Availability: Check https://app.fireworks.ai/ for current model status
Setup
- Create a Fireworks AI account at https://fireworks.ai/ and grab your API key from the dashboard
- Export it in your shell profile:
export FIREWORKS_API_KEY="your_api_key_here" - Restart your terminal or run
source ~/.zshrc - Invoke this skill when you want multiple open-weight AI perspectives on a question