LLM Council (Fireworks AI)

This skill implements Karpathy's LLM Council concept where multiple open-weight LLMs deliberate on a query, powered entirely by Fireworks AI:

Phase 1: All models respond to the query independently (parallel)
Phase 2: Models rank each other's anonymized responses
Phase 3: A Chairman LLM synthesizes the final answer

All inference runs through Fireworks AI using open-weight models. The speed and pricing of Fireworks makes it practical to run multi-model deliberation that would be slow or expensive on other providers.

CRITICAL RULES

ALWAYS use AskUserQuestion to let the user select council models (multiselect) and the Chairman model
ALWAYS save raw responses to files - never summarize or truncate API outputs
ALWAYS show full transparency - display all individual responses, all rankings, AND the final synthesis
NEVER skip the ranking phase - it is essential to the council deliberation process
Read from files for display - ensures content is shown unmodified
ALWAYS display the final output to the user after Phase 3 completes

Pre-flight Check

Before running any phase, verify the Fireworks API key is set:

if [ -z "$FIREWORKS_API_KEY" ]; then
  echo "ERROR: FIREWORKS_API_KEY is not set."
  echo "Create a Fireworks AI account at: https://fireworks.ai/"
  echo "Then export it in your shell profile (~/.zshrc or ~/.bashrc):"
  echo '  export FIREWORKS_API_KEY="your_api_key_here"'
  exit 1
fi
echo "FIREWORKS_API_KEY is set."

Available Models

Present these options to the user via AskUserQuestion (multiselect):

Model	Fireworks ID	Provider
GLM 5	accounts/fireworks/models/glm-5	Z.ai
DeepSeek V3.1	accounts/fireworks/models/deepseek-v3p1	DeepSeek
DeepSeek V3.2	accounts/fireworks/models/deepseek-v3p2	DeepSeek
MiniMax M2.1	accounts/fireworks/models/minimax-m2p1	MiniMax
Kimi K2.5	accounts/fireworks/models/kimi-k2p5	Moonshot
Qwen3 235B	accounts/fireworks/models/qwen3-235b-a22b	Alibaba
Llama 4 Maverick	accounts/fireworks/models/llama4-maverick-instruct-basic	Meta

Workflow

Step 1: Gather User Input

Use AskUserQuestion to get:

The query/question for the council (or accept it from the conversation)
Which models to include (multiselect, recommend 3-5 models)
Which model should be the Chairman (single select)

Note: AskUserQuestion supports max 4 options per question. Since there are 7 models, split model selection across two questions, or show the most popular 4 and let the user type "Other" for the rest. A good default is to show 4 models in the first question and note the others are available via "Other". Rotate which models are shown based on variety.

Example AskUserQuestion for model selection (show 4, mention others):

question: "Which models should participate in the LLM Council? (Also available via Other: Llama 4 Maverick, Qwen3 235B, GLM 5)"
header: "Models"
multiSelect: true
options:
  - label: "DeepSeek V3.2"
    description: "DeepSeek's newest and most capable model"
  - label: "MiniMax M2.1"
    description: "MiniMax's strong open-weight model"
  - label: "Kimi K2.5"
    description: "Moonshot's strong open-weight model"
  - label: "DeepSeek V3.1"
    description: "DeepSeek's proven reasoning model"

Example AskUserQuestion for chairman:

question: "Which model should be the Chairman (synthesizes the final answer)?"
header: "Chairman"
multiSelect: false
options:
  - label: "DeepSeek V3.2 (Recommended)"
    description: "Newest DeepSeek, strong at comprehensive analysis"
  - label: "GLM 5"
    description: "Strong reasoning for synthesis"
  - label: "Kimi K2.5"
    description: "Strong at structured synthesis"
  - label: "MiniMax M2.1"
    description: "Strong open-weight model for synthesis"

Model Name to ID Mapping

Use this mapping to convert user selections to Fireworks model IDs:

MODEL_MAP = {
    "GLM 5": "accounts/fireworks/models/glm-5",
    "DeepSeek V3.1": "accounts/fireworks/models/deepseek-v3p1",
    "DeepSeek V3.2": "accounts/fireworks/models/deepseek-v3p2",
    "MiniMax M2.1": "accounts/fireworks/models/minimax-m2p1",
    "Kimi K2.5": "accounts/fireworks/models/kimi-k2p5",
    "Qwen3 235B": "accounts/fireworks/models/qwen3-235b-a22b",
    "Llama 4 Maverick": "accounts/fireworks/models/llama4-maverick-instruct-basic",
}

Step 2: Run Phase 1 - Individual Responses

After gathering input, run this script to get responses from all selected models in parallel:

QUERY="USER_QUERY_HERE"
MODELS='["accounts/fireworks/models/glm-5", "accounts/fireworks/models/deepseek-v3p1"]'

python3 << 'PYEOF'
import os
import json
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed

FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"

QUERY = os.environ.get("QUERY", "")
MODELS = json.loads(os.environ.get("MODELS", "[]"))

# Create session directory
timestamp = time.strftime("%Y%m%d-%H%M%S")
SESSION_DIR = f"/tmp/llm-council/{timestamp}"
os.makedirs(SESSION_DIR, exist_ok=True)

# Save config
config = {"query": QUERY, "models": MODELS, "timestamp": timestamp}
with open(f"{SESSION_DIR}/config.json", "w") as f:
    json.dump(config, f, indent=2)

def call_model(model_id, query):
    """Call a single model via Fireworks AI"""
    try:
        start = time.time()
        response = requests.post(
            API_URL,
            headers={
                "Authorization": f"Bearer {FIREWORKS_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "model": model_id,
                "messages": [
                    {"role": "system", "content": "You are participating in an LLM council deliberation. Provide your best, most thoughtful response to the query. Be comprehensive but focused."},
                    {"role": "user", "content": query}
                ],
                "max_tokens": 4000,
                "temperature": 1
            },
            timeout=120
        )
        response.raise_for_status()
        elapsed = time.time() - start
        data = response.json()
        usage = data.get("usage", {})
        return {
            "success": True,
            "content": data["choices"][0]["message"]["content"],
            "model": model_id,
            "latency_seconds": round(elapsed, 2),
            "tokens": {
                "prompt": usage.get("prompt_tokens", 0),
                "completion": usage.get("completion_tokens", 0),
                "total": usage.get("total_tokens", 0)
            }
        }
    except Exception as e:
        return {
            "success": False,
            "content": f"[ERROR: {str(e)}]",
            "model": model_id,
            "latency_seconds": 0,
            "tokens": {"prompt": 0, "completion": 0, "total": 0}
        }

print(f"\n{'='*60}")
print("PHASE 1: Collecting Individual Responses")
print(f"{'='*60}")
print(f"Query: {QUERY[:200]}...")
print(f"Models: {', '.join([m.split('/')[-1] for m in MODELS])}")
print(f"Session: {SESSION_DIR}")
print()

# Parallel execution
results = {}
with ThreadPoolExecutor(max_workers=len(MODELS)) as executor:
    futures = {executor.submit(call_model, m, QUERY): m for m in MODELS}
    for future in as_completed(futures):
        model = futures[future]
        result = future.result()
        results[model] = result
        status = "OK" if result["success"] else "FAILED"
        latency = f"{result['latency_seconds']}s" if result["success"] else "N/A"
        print(f"  [{status}] {model.split('/')[-1]} ({latency})")

# Save raw results
with open(f"{SESSION_DIR}/phase1_responses.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"\nPhase 1 complete. Results saved to: {SESSION_DIR}/phase1_responses.json")
print(f"SESSION_DIR={SESSION_DIR}")
PYEOF

Step 3: Run Phase 2 - Cross-Model Ranking

Each model reviews and ranks the anonymized responses from Phase 1:

SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"

python3 << 'PYEOF'
import os
import json
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed

FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"
SESSION_DIR = os.environ.get("SESSION_DIR")

# Load Phase 1 results
with open(f"{SESSION_DIR}/config.json") as f:
    config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
    phase1_results = json.load(f)

QUERY = config["query"]
MODELS = config["models"]

# Create anonymized mapping
labels = ["A", "B", "C", "D", "E", "F", "G"][:len(MODELS)]
model_to_label = dict(zip(MODELS, labels))
label_to_model = {v: k for k, v in model_to_label.items()}

# Format anonymized responses
anonymized_responses = []
for model_id in MODELS:
    label = model_to_label[model_id]
    content = phase1_results[model_id]["content"]
    anonymized_responses.append(f"=== Response {label} ===\n{content}")

anonymized_text = "\n\n".join(anonymized_responses)

def get_rankings(model_id, query, anonymized, own_label):
    """Get rankings from a single model"""
    ranking_prompt = f"""You are evaluating responses from multiple AI models to this query:

QUERY: {query}

Here are the anonymized responses:

{anonymized}

Please rank these responses from BEST to WORST. For each ranking:
1. State the response letter (A, B, C, etc.)
2. Give a brief reason (1-2 sentences)
3. You may skip ranking your own response (labeled {own_label}) or rank it fairly

Format your response EXACTLY as:
RANKINGS:
1. [Letter] - [Brief reason]
2. [Letter] - [Brief reason]
3. [Letter] - [Brief reason]
..."""

    try:
        start = time.time()
        response = requests.post(
            API_URL,
            headers={
                "Authorization": f"Bearer {FIREWORKS_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "model": model_id,
                "messages": [
                    {"role": "system", "content": f"You are ranking AI responses objectively. Your own response is labeled '{own_label}'."},
                    {"role": "user", "content": ranking_prompt}
                ],
                "max_tokens": 1000,
                "temperature": 1
            },
            timeout=90
        )
        response.raise_for_status()
        elapsed = time.time() - start
        return {
            "success": True,
            "content": response.json()["choices"][0]["message"]["content"],
            "model": model_id,
            "latency_seconds": round(elapsed, 2)
        }
    except Exception as e:
        return {
            "success": False,
            "content": f"[ERROR: {str(e)}]",
            "model": model_id,
            "latency_seconds": 0
        }

print(f"\n{'='*60}")
print("PHASE 2: Cross-Model Ranking")
print(f"{'='*60}")
print(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()})}")
print()

# Collect rankings from all models in parallel
rankings = {}
with ThreadPoolExecutor(max_workers=len(MODELS)) as executor:
    futures = {
        executor.submit(get_rankings, mid, QUERY, anonymized_text, model_to_label[mid]): mid
        for mid in MODELS
    }
    for future in as_completed(futures):
        model = futures[future]
        result = future.result()
        rankings[model] = result
        status = "OK" if result["success"] else "FAILED"
        latency = f"{result['latency_seconds']}s" if result["success"] else "N/A"
        print(f"  [{status}] {model.split('/')[-1]} ({latency})")

# Save rankings
output = {
    "label_mapping": label_to_model,
    "model_to_label": model_to_label,
    "rankings": rankings
}
with open(f"{SESSION_DIR}/phase2_rankings.json", "w") as f:
    json.dump(output, f, indent=2)

print(f"\nPhase 2 complete. Rankings saved to: {SESSION_DIR}/phase2_rankings.json")
PYEOF

Step 4: Run Phase 3 - Chairman Synthesis

The Chairman model receives all responses and rankings, then produces the final synthesis:

SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"
CHAIRMAN_MODEL="accounts/fireworks/models/glm-5"

python3 << 'PYEOF'
import os
import json
import requests
import time

FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
API_URL = "https://api.fireworks.ai/inference/v1/chat/completions"
SESSION_DIR = os.environ.get("SESSION_DIR")
CHAIRMAN_MODEL = os.environ.get("CHAIRMAN_MODEL")

# Load all previous results
with open(f"{SESSION_DIR}/config.json") as f:
    config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
    phase1 = json.load(f)
with open(f"{SESSION_DIR}/phase2_rankings.json") as f:
    phase2 = json.load(f)

QUERY = config["query"]
label_to_model = phase2["label_mapping"]
model_to_label = phase2["model_to_label"]

# Format responses with model names revealed
responses_text = []
for model_id, result in phase1.items():
    label = model_to_label.get(model_id, "?")
    model_name = model_id.split("/")[-1]
    responses_text.append(f"=== {label}: {model_name} ===\n{result['content']}")

# Format rankings
rankings_text = []
for model_id, result in phase2["rankings"].items():
    model_name = model_id.split("/")[-1]
    rankings_text.append(f"[{model_name}'s Rankings]\n{result['content']}")

synthesis_prompt = f"""You are the Chairman of an LLM Council. Your task is to synthesize the best possible answer from multiple AI responses.

ORIGINAL QUERY:
{QUERY}

INDIVIDUAL RESPONSES:
{chr(10).join(responses_text)}

MODEL RANKINGS:
{chr(10).join(rankings_text)}

As Chairman, produce a FINAL SYNTHESIS that:
1. Incorporates the strongest elements from the best-ranked responses
2. Resolves any contradictions between responses
3. Addresses aspects that multiple models agreed on
4. Corrects any errors identified through cross-ranking
5. Provides the most complete, accurate, and helpful answer

Begin your synthesis:"""

print(f"\n{'='*60}")
print("PHASE 3: Chairman Synthesis")
print(f"{'='*60}")
print(f"Chairman: {CHAIRMAN_MODEL.split('/')[-1]}")
print()

try:
    start = time.time()
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {FIREWORKS_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "model": CHAIRMAN_MODEL,
            "messages": [
                {"role": "system", "content": "You are the Chairman of an LLM Council. Synthesize multiple AI perspectives into a definitive, comprehensive response."},
                {"role": "user", "content": synthesis_prompt}
            ],
            "max_tokens": 4000,
            "temperature": 1
        },
        timeout=180
    )
    response.raise_for_status()
    elapsed = time.time() - start
    synthesis = response.json()["choices"][0]["message"]["content"]

    with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f:
        f.write(synthesis)

    print(f"Phase 3 complete ({elapsed:.2f}s). Synthesis saved to: {SESSION_DIR}/phase3_synthesis.txt")

except Exception as e:
    print(f"ERROR: {e}")
    synthesis = f"[ERROR: {str(e)}]"
    with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f:
        f.write(synthesis)

# Update config with chairman
config["chairman"] = CHAIRMAN_MODEL
with open(f"{SESSION_DIR}/config.json", "w") as f:
    json.dump(config, f, indent=2)
PYEOF

Step 5: Display Full Results

Read all saved files and display the complete council deliberation:

SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE"

python3 << 'PYEOF'
import os
import json

SESSION_DIR = os.environ.get("SESSION_DIR")

# Load all data
with open(f"{SESSION_DIR}/config.json") as f:
    config = json.load(f)
with open(f"{SESSION_DIR}/phase1_responses.json") as f:
    phase1 = json.load(f)
with open(f"{SESSION_DIR}/phase2_rankings.json") as f:
    phase2 = json.load(f)
with open(f"{SESSION_DIR}/phase3_synthesis.txt") as f:
    synthesis = f.read()

model_to_label = phase2["model_to_label"]
label_to_model = phase2["label_mapping"]

# Build formatted output
output = []
output.append("=" * 70)
output.append("                  LLM COUNCIL DELIBERATION")
output.append("                  Powered by Fireworks AI")
output.append("=" * 70)
output.append("")
output.append(f"QUERY: {config['query']}")
output.append(f"COUNCIL: {', '.join([m.split('/')[-1] for m in config['models']])}")
output.append(f"CHAIRMAN: {config.get('chairman', 'N/A').split('/')[-1]}")
output.append("")

# Phase 1: Individual Responses
output.append("-" * 70)
output.append("                 PHASE 1: INDIVIDUAL RESPONSES")
output.append("-" * 70)
output.append("")

for model_id, result in phase1.items():
    model_name = model_id.split("/")[-1]
    label = model_to_label.get(model_id, "?")
    latency = result.get("latency_seconds", "N/A")
    tokens = result.get("tokens", {})
    output.append(f"[{label}] {model_name} (latency: {latency}s, tokens: {tokens.get('total', 'N/A')})")
    output.append("-" * 40)
    output.append(result["content"])
    output.append("")

# Phase 2: Cross-Model Rankings
output.append("-" * 70)
output.append("                 PHASE 2: CROSS-MODEL RANKINGS")
output.append("-" * 70)
output.append("")
output.append(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()}, indent=2)}")
output.append("")

for model_id, result in phase2["rankings"].items():
    model_name = model_id.split("/")[-1]
    output.append(f"[{model_name}'s Rankings]")
    output.append(result["content"])
    output.append("")

# Phase 3: Chairman Synthesis
output.append("-" * 70)
output.append("                 PHASE 3: CHAIRMAN'S SYNTHESIS")
output.append("-" * 70)
output.append("")
chairman_name = config.get("chairman", "Chairman").split("/")[-1]
output.append(f"[{chairman_name} - Chairman]")
output.append("")
output.append(synthesis)
output.append("")
output.append("=" * 70)
output.append(f"Session files: {SESSION_DIR}/")

# Save formatted output
final_output = "\n".join(output)
with open(f"{SESSION_DIR}/final_output.md", "w") as f:
    f.write(final_output)

print(final_output)
print(f"\nFull output saved to: {SESSION_DIR}/final_output.md")
PYEOF

Important Notes

Session Directory: Each run creates a unique session in /tmp/llm-council/{timestamp}/
Raw Data Preserved: All API responses are saved as-is to JSON files for full transparency
Cost: Fireworks pricing is per-token. More models and longer queries cost more. Check current pricing at https://fireworks.ai/pricing
Latency Tracking: Each API call tracks latency so you can see Fireworks' speed in action
Token Usage: Phase 1 responses include token counts for cost awareness
Rate Limits: If you hit rate limits, wait briefly and retry
Model Availability: Check https://app.fireworks.ai/ for current model status

Setup

Create a Fireworks AI account at https://fireworks.ai/ and grab your API key from the dashboard

Export it in your shell profile:

export FIREWORKS_API_KEY="your_api_key_here"

Restart your terminal or run source ~/.zshrc
Invoke this skill when you want multiple open-weight AI perspectives on a question

llm-council