ask-many-models
Ask Many Models
Send the same prompt to multiple AI models in parallel and synthesise their responses into a unified analysis.
When this skill is invoked
IMPORTANT: When this skill is triggered (via /ask-many-models or natural language), follow the execution steps below. Do NOT just describe what the skill does.
Execution Steps
Step 1: Get or draft the prompt
A) Cold start (conversation just began, no prior discussion): If the user provided a prompt/question, use it. Otherwise ask: "What question would you like to send to multiple AI models?"
B) Mid-conversation (there's been substantive discussion before this): When invoked after a conversation, ALWAYS draft a comprehensive prompt that:
- Captures the full context - Include relevant background, constraints, and goals discussed
- Includes substantive content - Don't just summarise files; include actual excerpts, code snippets, or data that other models need to answer well
- States the core question clearly - What specific insight/decision/analysis is needed
- Notes any constraints or preferences - Technical requirements, style preferences, etc.
Prompt drafting checklist:
- Background context (2-4 paragraphs minimum)
- Any relevant file contents or code (include actual content, not just "see attached")
- The specific question(s) to answer
- What format/depth of response is useful
IMPORTANT: Err on the side of including MORE context than seems necessary. Other models don't have access to this conversationβthey only see the prompt you write. A prompt that seems "too long" to you is usually about right.
Save the drafted prompt to a file and show it to the user for approval before proceeding:
echo "<prompt>" > /tmp/amm-prompt-draft.md && open /tmp/amm-prompt-draft.md
Ask: "I've drafted a prompt capturing our discussion. Please review and let me know if you'd like any changes, or say 'go' to proceed."
Step 2: Model selection
Do NOT use AskUserQuestion for model selection (it has a 4-option limit which is too restrictive). Instead, print this menu and wait for user input:
Which models should I query?
1. β‘ Defaults - GPT-5.4 Thinking, Claude 4.6 Opus Thinking, Gemini 3.1 Pro, Grok 4.1 (Recommended)
2. π Quick - Gemini 3 Flash, Grok 4.1 Fast, Claude 4.5 Sonnet (~10s)
3. π Comprehensive - Defaults + GPT-5.4 Pro (slow, extra compute)
4. π¬ Deep Research - OpenAI/Gemini deep research + GPT-5.4 Pro (10-20 min)
5. π§ Pick models - Choose individual models
Enter a number (1-5):
If user selects 5 (Pick models), print this list and ask for comma-separated numbers:
Available models:
1. gpt-5.4-thinking (default)
2. claude-4.6-opus-thinking (default)
3. gemini-3.1-pro (default)
4. grok-4.1 (default)
5. gemini-3-flash
6. grok-4.1-non-reasoning
7. claude-4.5-sonnet
8. gpt-5.4
9. gpt-5.4-pro (slow, extra compute)
10. claude-4.6-opus
11. openai-deep-research (10-20 min)
12. gemini-deep-research (10-20 min)
Enter numbers (e.g. 1,2,5):
Then map user's numbers to model IDs.
Step 3: Check for images
If an image is in the conversation, save it to:
/Users/ph/.claude/skills/ask-many-models/data/model-outputs/image-TIMESTAMP.png
Step 4: Run the query
Map selection to model IDs:
- Defaults:
gpt-5.4-thinking,claude-4.6-opus-thinking,gemini-3.1-pro,grok-4.1 - Quick:
gemini-3-flash,grok-4.1-non-reasoning,claude-4.5-sonnet - Comprehensive:
gpt-5.4-thinking,claude-4.6-opus-thinking,gemini-3.1-pro,grok-4.1,gpt-5.4-pro - Deep Research:
openai-deep-research,gemini-deep-research,gpt-5.4-pro
Generate slug from prompt (lowercase, non-alphanumeric β hyphens, max 50 chars).
cd /Users/ph/.claude/skills/ask-many-models && yarn query \
--models "<model-ids>" \
--synthesise \
--output-format both \
[--image "<path>"] \
"<prompt>"
The script auto-generates an output directory at data/model-outputs/<timestamp>-<slug>/ containing results.md, results.html, and individual model responses.
Step 5: Open results
Say "Querying: [models]" and open the results file. Check data/user-defaults.json for open_preference:
"html"βopen "<output-dir>/results.html""markdown"(or absent) βopen "<output-dir>/results.md"
Reference Documentation
Terminal CLI (Fastest)
Run amm directly from your terminal for instant model selection:
amm "What are the key considerations for X?"
Options:
--quickor-q- Skip model selection, use defaults--no-synthesise- Skip the synthesis step
Default models are configured in data/user-defaults.json.
Output format
Results can be output as markdown, HTML, or both. The preference is stored in data/user-defaults.json under output_format. On first run via amm, you'll be prompted to choose. The HTML version uses serif typography optimised for long-form reading.
--output-format markdownβ markdown only (default for script invocation)--output-format htmlβ HTML only--output-format bothβ both markdown and HTML
Image Support
Paste an image into your message along with your question to have vision-capable models analyse it:
/amm "What's in this image?" [paste image]
Vision-capable models: GPT-5.4 Thinking, Claude 4.6 Opus Thinking, Claude 4.5 Sonnet, Gemini 3.1 Pro, Gemini 3 Flash
Models without vision support will receive just the text prompt with a note that an image was provided.
Direct Script Invocation
Run the query script directly:
cd /Users/ph/.claude/skills/ask-many-models
yarn query "Your question here"
Options:
--preset <name>- Use a preset:quick,comprehensive--models <list>- Specify models:gpt-4o,gemini-2.0-flash--timeout <seconds>- Timeout per model (default: 180)--image <path>- Include an image file for vision models
Available Commands
yarn query presets # List available presets
yarn query models # List available models
yarn query list # List recent queries
yarn query show <dir> # Display responses from a query
yarn query synthesise <dir> # Generate synthesis prompt
Workflow
Step 1: Query Models
yarn query --preset frontier "What are the key considerations for..."
This will:
- Query all models in the preset in parallel
- Save responses to
data/model-outputs/<timestamp>-<slug>/ - Print a summary of successful/failed queries
Step 2: Synthesise Responses
The skill generates a synthesis prompt. To synthesise:
-
Generate the prompt:
yarn query synthesise data/model-outputs/<your-query-dir> -
Copy the output and send it to Claude
-
Save Claude's synthesis to the query directory as
synthesis.md
Alternatively, read the individual responses from the individual/ subdirectory and ask Claude directly to synthesise them.
Model Presets
| Preset | Models | Use Case |
|---|---|---|
quick |
Gemini 3 Flash, Grok 4.1 (Fast), Claude 4.5 Sonnet | Fast responses (~10s) |
comprehensive |
Defaults + GPT-5.4 Pro | Thorough coverage (~60s) |
deep-research |
OpenAI Deep Research, Gemini Deep Research | In-depth research (API, 10-20 min) |
comprehensive-deep |
Quick models + deep research | Best of both worlds |
Deep Research Mode
Deep research models (OpenAI o3-deep-research and Gemini Deep Research) conduct comprehensive web research and take 10-20 minutes per model.
Using Deep Research
From the amm CLI, select "π¬ Deep Research" or "π¬π Comprehensive + Deep Research":
amm "What are the latest developments in quantum computing?"
When deep research is selected:
- Duration warning is shown (10-20 minutes expected)
- Context picker lets you add files/folders as background context
- Quick models return results in ~30 seconds with preliminary synthesis
- Deep research shows progress updates every 10 seconds
- Final synthesis updates when deep research completes
- Desktop notification fires on completion
Context Files
Add context to your deep research queries:
- When prompted, select "Add context file/folder..."
- Choose a file (
.md,.txt) or folder - Context is prepended to the prompt for all models
This is useful for:
- Research related to a specific project
- Questions about documents you've written
- Follow-up research with prior findings
How It Works
- Quick models (GPT, Claude, Gemini, Grok) query in parallel β results in ~30s
- Deep research models start in background with progress polling
- Preliminary synthesis runs with quick model responses
- Deep research updates show status every 10 seconds
- Final synthesis incorporates deep research findings when complete
Synthesis Approach
The synthesis identifies:
- Consensus - Points where multiple models agree (high confidence)
- Unique insights - Valuable points only one model mentioned
- Disagreements - Contradictions with pros/cons analysis
- Confidence assessment - Overall reliability based on agreement
Synthesis Depths
| Depth | Output | Use Case |
|---|---|---|
brief |
2-3 sentences | Quick sanity check |
executive |
1-2 paragraphs + bullets | Default, most queries |
full |
Multi-section document | Important decisions |
Configuration
API Keys
Create .env from .env.example:
cp .env.example .env
Required keys:
OPENAI_API_KEY- For GPT modelsANTHROPIC_API_KEY- For Claude modelsGOOGLE_GENERATIVE_AI_API_KEY- For Gemini modelsXAI_API_KEY- For Grok models
Model Configuration
Model definitions and presets are in models.json (shipped with the skill). To customise, create a config.json with just the keys you want to overrideβit merges on top of models.json. See config.example.json for the format.
When updating model IDs, also update the VISION_MODELS array in scripts/query.ts β it has a hardcoded list of vision-capable model keys that must match models.json.
Output Structure
data/model-outputs/
βββ 2026-01-12-1430-your-question/
βββ results.md # Live results + synthesis (markdown)
βββ results.html # Live results + synthesis (HTML)
βββ responses.json # Raw API responses
βββ individual/
βββ gpt-5.4-thinking.md
βββ claude-4.6-opus-thinking.md
βββ gemini-3.1-pro.md
βββ grok-4.md
Available Models
Quick/Standard Models
| Model ID | Display Name | Provider | Vision |
|---|---|---|---|
| gpt-5.4-thinking | GPT-5.4 Thinking | OpenAI | β |
| claude-4.6-opus-thinking | Claude 4.6 Opus Thinking | Anthropic | β |
| grok-4.1 | Grok 4.1 (Reasoning) | xAI | |
| gemini-3.1-pro | Gemini 3.1 Pro | β | |
| gemini-3-flash | Gemini 3 Flash | β | |
| gpt-5.4 | GPT-5.4 | OpenAI | β |
| gpt-5.4-pro | GPT-5.4 Pro | OpenAI | β |
| claude-4.6-opus | Claude 4.6 Opus | Anthropic | β |
| claude-4.5-sonnet | Claude 4.5 Sonnet | Anthropic | β |
| grok-4.1-non-reasoning | Grok 4.1 (Fast) | xAI |
Deep Research Models
| Model ID | Display Name | Provider | Duration |
|---|---|---|---|
| openai-deep-research | OpenAI Deep Research | OpenAI | 10-20 min |
| gemini-deep-research | Gemini Deep Research | 10-20 min |
Notifications
Desktop notifications via terminal-notifier:
- Install:
brew install terminal-notifier - Notifications sent when:
- Query completes
- Async request (deep research) completes
- Errors occur
Slow Models & Progressive Synthesis
Some models (like GPT-5.4 Pro) use extra compute and can take 10-60 minutes for complex queries. These are marked as "slow" in the config.
When slow models are included:
- Progress display shows real-time status of all models with β/β/β icons
- Fast models complete first β preliminary synthesis runs immediately
- Slow models continue in background with "(slow)" indicator
- Final synthesis replaces preliminary when all models complete
The live markdown file updates continuously so you can read responses as they arrive.
Error Handling
- Model timeout: Marked as failed, other responses still synthesised
- API error: Retries with exponential backoff (3 attempts)
- Partial failure: Synthesis proceeds with available responses
- Browser not available: Warns user to restart with
--chrome
Tips
- Start with
quickpreset for rapid iteration - Use defaults for important questions where quality matters
- Save synthesis prompts for consistent formatting
- Check individual responses when synthesis seems off
- Override model IDs via
config.jsonas providers release new models