deep-research
/deep-research -- Iterative Deep Research via Codex + tmux
Decompose a research question using MECE principles, spawn parallel Codex agents in tmux panes for web research, synthesize findings in-context, identify gaps, spawn deeper agents, and repeat until coverage is sufficient.
Architecture: Claude = brain (orchestrator, decomposer, synthesizer).
Codex = legs (parallel web research in tmux panes). Zero claude -p automation.
Research Question
|
[MECE Decompose] Claude thinks
|
+-----+-----+
v v v
[Codex] [Codex] [Codex] Level 0: broad sweep
+-----+-----+
|
[Synthesis L0] Claude thinks, finds gaps
|
+-----+-----+
v v
[Codex] [Codex] Level 1: targeted deep dives
+-----+-----+
|
[Synthesis L1] Claude thinks, enough?
| |
[Level 2...] [Final Report]
Argument
A research topic or question. Example: "state of production LLM applications best practices 2026"
Procedure
Phase 1 -- Understand
Clarify the research question. Identify:
- The core topic and scope
- The timeframe (if any)
- How many dimensions / angles the user wants covered
- Desired depth (surface scan vs exhaustive)
- Whether adversarial / contrarian coverage is wanted
If unclear, ask the user before proceeding.
Phase 2 -- MECE Decompose (Level 0)
Split the research into 3-6 independent domains. For each domain, create 1-2 search variants (a/b) that approach from different angles.
Decomposition rules:
- Divide by research domain, NOT by source type
- Each variant should use different search terms and angles
- All domains together must be collectively exhaustive -- no gaps
- Prioritize exhaustiveness over exclusivity
Present the decomposition table and wait for user approval:
| # | Node ID | Domain | Angle |
|---|-------------------|------------|------------------------|
| 1 | L0_frameworks_a | Frameworks | Production-ready tools |
| 2 | L0_frameworks_b | Frameworks | Benchmark comparisons |
| 3 | L0_reliability_a | Reliability| Failure postmortems |
| 4 | L0_security_a | Security | OWASP LLM top 10 |
| 5 | L0_security_b | Security | Compliance & legal |
After the table, generate an ASCII DAG diagram using the actual node IDs to show the planned research structure. The diagram must show which nodes run in parallel (Level 0) and the sequential synthesis/extension flow:
Research Question: "{topic}"
|
[MECE Decompose]
|
+-----+-----+-----+-----+
v v v v v
[L0_a] [L0_b] [L0_c] [L0_d] [L0_e] Level 0 (parallel)
+-----+-----+-----+-----+
|
[Synthesis L0]
|
+----+----+
v v
[L1_a] [L1_b] Level 1 (after gaps found)
+----+----+
|
[Synthesis L1]
|
[Final Report]
Replace the placeholders with the actual node IDs from the decomposition table. L1 nodes are speculative at this stage -- label them as "(spawned after L0 synthesis if gaps found)".
Present both the table and the DAG together. Ask: "This the right DAG?"
Do NOT proceed until the user approves this decomposition.
Phase 2b -- Configure Codex Agents
Before launching agents, discover available models and let the user choose the model and reasoning effort level.
Step 1: Discover available models.
Run:
codex --help
Check if there is a model listing command or flag. If not, fetch the current model list from the OpenAI Codex models page via web search: https://developers.openai.com/codex/models
Present the discovered models in a table with a recommended default:
| Model | Notes |
|--------------------|-------------------------------------------|
| gpt-5.4 | Flagship, recommended default |
| gpt-5.4-mini | Faster, lower cost |
| gpt-5.3-codex | Strong for complex tasks |
| ... | (other models discovered at runtime) |
Step 2: Present reasoning effort levels.
| Level | Use case |
|---|---|
minimal |
Fastest, least thorough |
low |
Quick scans |
medium |
Balanced speed and quality (recommended default) |
high |
Thorough research |
xhigh |
Maximum depth, slowest |
Step 3: Confirm with user.
Model: gpt-5.4
Reasoning effort: medium
Accept defaults? Or specify model and/or reasoning effort.
Record the chosen values. These are referenced as {model} and
{reasoning_flag} in all codex exec commands below.
- If reasoning effort is set:
{reasoning_flag}=-c model_reasoning_effort={effort} - If the user declines reasoning config:
{reasoning_flag}= `` (empty)
Do NOT proceed until the user confirms model and reasoning effort.
Phase 3 -- Create Prompts + Launch Codex Agents
3a. Create output directories and prompts
For each agent, create the directory and write the prompt:
mkdir -p output/{node_id}
Write output/{node_id}/prompt.md with this structure:
You are a research agent. DO NOT modify any files in any codebase.
Your ONLY job is to search the web and compile findings.
# Mission: [Specific research goal for this agent]
## Search Strategy
[10+ specific search queries, tailored to this agent's angle]
[Mix of site:-restricted queries for authoritative sources and broad queries]
## What to Capture
For each useful source:
1. URL
2. Title
3. Key points (3-5 bullets)
4. Confidence: high (official/peer-reviewed), medium (reputable blog), low (forum/anecdotal)
5. Relevant excerpt
## Output Format
### Findings
**Source: [Title](URL)**
- Confidence: high/medium/low
- Key points:
- ...
- Excerpt: "..."
[Repeat for each source]
### Summary
[2-3 paragraphs answering the research question based on findings]
### Sources List
[Flat list of all URLs cited]
For Level 1+ agents (deeper research), enrich the prompt:
You are a research agent. DO NOT modify any files in any codebase.
Your ONLY job is to search the web and compile findings.
# Context from prior research
[Inject relevant findings from previous synthesis]
# What we already know
[Summary so agent doesn't re-discover known facts]
# Mission: [Targeted question addressing a specific gap]
# What we need
[Specific evidence or sources that would resolve the gap]
## Search Strategy
[Targeted queries based on identified gaps]
## What to Capture
[Same structured format as above]
## Output Format
[Same structured format as above]
Prompt design rules:
- Always start with "DO NOT modify any files"
- 10+ search queries per agent for breadth
- Structured output format for aggregation
- Variance between a/b agents -- different terms for same domain
3b. Build tmux windows and launch
Open new windows inside the current tmux session -- do NOT create a
new session. Name each window deep_research_{domain} using a short
slug derived from the domain column in the decomposition table
(e.g. deep_research_frameworks, deep_research_security).
Determine the current session:
SESSION=$(tmux display-message -p '#S')
Calculate layout: ceil(total_agents / 4) windows, up to 4 panes each. Group agents by domain -- one window per domain where possible.
For each window:
# Create a named window in the current session
tmux new-window -t $SESSION -n deep_research_{domain}
# Split into up to 4 panes (first pane exists by default)
tmux split-window -t $SESSION:deep_research_{domain}
tmux split-window -t $SESSION:deep_research_{domain}
tmux split-window -t $SESSION:deep_research_{domain}
tmux select-layout -t $SESSION:deep_research_{domain} tiled
# VALIDATE before launching
tmux list-panes -t $SESSION:deep_research_{domain} -F "#{pane_index}"
CRITICAL: Always validate pane count before sending commands.
Launch each agent:
tmux send-keys -t $SESSION:deep_research_{domain}.{pane} -l -- "cat output/{node_id}/prompt.md | /opt/homebrew/bin/codex exec -m {model}{reasoning_flag} --full-auto - 2>&1 | tee output/{node_id}/output.md ; echo '=== RESEARCH AGENT COMPLETE ==='"
sleep 0.2
tmux send-keys -t $SESSION:deep_research_{domain}.{pane} Enter
After launching, verify processes:
ps aux | grep "codex exec" | grep -v grep | wc -l
Print monitoring instructions:
Switch windows: Ctrl-b n / Ctrl-b p
Switch panes: Ctrl-b + arrows
Zoom pane: Ctrl-b z
List research windows: tmux list-windows -t $SESSION | grep deep_research
Phase 4 -- Monitor + Collect
Poll all panes for the completion marker:
for pane in {list of session:window.pane targets}; do
while true; do
if tmux capture-pane -t $pane -p -S -10 2>/dev/null | grep -q "RESEARCH AGENT COMPLETE"; then
echo "$pane: DONE"
break
fi
dead=$(tmux display-message -t $pane -p '#{pane_dead}' 2>/dev/null)
if [ "$dead" = "1" ]; then
echo "$pane: PROCESS DIED"
break
fi
sleep 10
done
done
After all agents complete, capture scrollback as backup (codex tee truncation workaround):
tmux capture-pane -t $SESSION:{w}.{p} -p -S - > output/{node_id}/scrollback.txt
Read all output files. If output.md is truncated (fewer than 20 lines),
use scrollback.txt instead.
Phase 5 -- Intermediate Synthesis (Level N)
Read every output/{L{N}_*}/output.md file from the current level.
Cross-reference the findings and write output/synthesis_L{N}.md:
# Synthesis Level {N}: {topic}
## Key Findings by Domain
### {Domain 1}
- Finding with [source attribution](url)
...
## Cross-Domain Patterns
[Themes appearing across multiple domains -- highest value findings]
## Contradictions and Tensions
[Where sources disagree. Note which evidence is stronger and why]
## Evidence Gaps
[Areas with weak or missing coverage that need deeper investigation]
- Gap 1: [description] -- suggested queries: [...]
- Gap 2: [description] -- suggested queries: [...]
## New Questions Spawned
[Questions raised by the findings that weren't in the original decomposition]
Present the synthesis to the user.
Phase 6 -- DAG Extension (if gaps exist)
After presenting the synthesis, assess whether deeper research is needed. Consider:
- Are there significant evidence gaps?
- Are there unresolved contradictions?
- Did findings raise important new questions?
- Has the user requested specific deeper investigation?
If deeper research IS needed (and current level < max depth, between 2-5 as appropriate):
- Generate new sub-questions targeting gaps/contradictions
- Assign node IDs:
L{N+1}_{domain}_{variant} - Present the new agents table for user approval
- Loop back to Phase 3
If deeper research is NOT needed (or user says "enough"): Proceed to Phase 7.
Phase 7 -- Final Synthesis
Produce output/synthesis_final.md combining all levels:
# Final Research Synthesis: {topic}
## Executive Summary
[3-5 sentences: the answer to the original question]
## Key Themes
1. Theme (confidence: high/medium/low)
- Supporting evidence from L0, L1, etc.
2. ...
## Detailed Findings
[Organized by theme, with full source attribution]
## Contradictions
[How they were resolved across levels, or why they remain open]
## Remaining Gaps
[What this research did NOT cover adequately]
## Research DAG Executed
[Mermaid diagram -- see Phase 8]
## Source Index
| Level | Agent | Domain | Key Sources |
|-------|-------|--------|-------------|
| L0 | {id} | {domain} | [URLs] |
| L1 | {id} | {domain} | [URLs] |
Phase 8 -- Write Final Report
Write output/report.md -- a polished, publishable research report:
- Executive summary (readable standalone)
- Research landscape (mermaid diagram of the DAG executed)
- Detailed findings by theme with confidence levels and citations
- Contradictions and open questions
- Conclusions and recommendations
- Full source list grouped by level and domain
Generate the mermaid DAG diagram showing the actual research path:
graph TD
Q[Research Question] --> L0_a[...]
Q --> L0_b[...]
L0_a --> S0[Synthesis L0]
L0_b --> S0
S0 --> L1_a[...]
L1_a --> S1[Synthesis L1]
S1 --> Report[Final Report]
Troubleshooting
Agent stuck / no output after 5+ minutes:
tmux capture-pane -t $SESSION:deep_research_{domain}.{p} -p -S -20
Look for errors. Common: codex rate limit, model not supported, network issues.
tee output truncated (known codex issue):
tmux capture-pane -t $SESSION:deep_research_{domain}.{p} -p -S - > output/{node_id}/scrollback.txt
Use scrollback as the authoritative output.
Re-run a single failed agent:
tmux send-keys -t $SESSION:deep_research_{domain}.{p} -l -- "cat output/{node_id}/prompt.md | /opt/homebrew/bin/codex exec -m {model}{reasoning_flag} --full-auto - 2>&1 | tee output/{node_id}/output.md ; echo '=== RESEARCH AGENT COMPLETE ==='"
sleep 0.2
tmux send-keys -t $SESSION:deep_research_{domain}.{p} Enter
All agents rate-limited:
Stagger launches with sleep 30 between send-keys, or reduce parallel count.
Important
- NEVER use
claude -pfor research agents. Use/opt/homebrew/bin/codex execonly. - Always decompose by research domain, never by source type.
- Always wait for user approval of decomposition before launching agents.
- Always validate tmux panes exist before sending commands.
- Always capture scrollback as backup after agent completion.
- Max depth: 2-5 levels as appropriate for the topic's complexity. User can override with "go deeper" or "enough".
- Use the current tmux session. Name windows
deep_research_{domain}to avoid collisions. - Default Codex model:
gpt-5.4. Default reasoning effort:medium. Always confirm with user in Phase 2b. - Discover available models dynamically (codex --help or web search) -- do not rely on a hardcoded list.
More from jay-523/agent-skills
tmux
Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
16insights
Extract coding patterns and preferences from session transcripts for CLAUDE.md
16socratic_mentor
Guided learning through Socratic questioning - teaches through discovery, not answers
16test
Full TDD red-green-refactor cycle with automatic framework detection
14parallel-research
Decompose a research question into MECE domains and run parallel Claude agents via tmux
14parallel
Split implementation tasks into independent workstreams running as parallel agents in git worktrees
13