Deep Research

Deep research activated — running a local-first → web-multi-source pipeline. I'll plan, disambiguate, search across diverse angles, cross-reference, and deliver a structured summary with cited sources.

Overview

Multi-source research that earns its name through breadth of angle, depth of cross-reference, and discipline of citation. Run 5-10+ web searches across diverse angles, prefer primary sources, surface disagreements, and synthesize a structured summary returned in the conversation.

Core principle: Research depth comes from query diversity, source cross-referencing, and an honest accounting of gaps. Always cite. Never fabricate. Surface disagreement; don't paper over it.

The Five Search Angles

Every substantive research run covers these angles. State which you'll use in the plan; tag each search with one.

Angle	Looks for
Official	Primary docs, specs, vendor sources, peer-reviewed papers
Comparative	"X vs Y", alternatives, head-to-head analyses
Criticism	Limitations, known issues, failure stories, public criticism
Currency	Recent developments (current and previous year), changelogs, trend pieces
Community	HN, Reddit, blog posts, Stack Overflow — signal not authority

Pipeline

┌─────────┐   ┌──────┐   ┌────────────┐   ┌────────┐   ┌─────────┐   ┌────────────┐   ┌──────────────┐
│ Phase 0 │ → │ P1   │ → │ P2         │ → │ P3     │ → │ P4      │ → │ P5         │ → │ P6           │
│ Local   │   │ Plan │   │ Disambig.  │   │ Search │   │ Reflect │   │ Synthesize │   │ Cite-verify  │
└─────────┘   └──────┘   └────────────┘   └────────┘   └─────────┘   └────────────┘   └──────────────┘

Every phase has a MUST gate before moving to the next.

Modes

Mode	When	Searches	Output
`quick`	Fast lookup, low-stakes context	3-5	One-paragraph answer + 3-5 sources
default	Standard topic, single-domain	5-10+	Full synthesis template
`comparison`	3+ items being compared	10-15	Mandatory matrix at top of report
`landscape`	Broad survey of an unfamiliar/wide space	10+	Parallel research subagents → consensus report

Usage

Invocation	What happens
`/deep-research <topic>`	Default mode — full synthesis with sources in chat
`research X for me`	Auto-activates default mode
`compare X and Y`	Auto-activates `comparison` mode
`survey the landscape of X`	Auto-activates `landscape` mode

Default behavior: Output stays in the conversation. No files are created. If the user wants the report saved, that's a follow-up step (see Save-as-Note Handoff).

When to Use

Always use when:

User asks to "research", "investigate", "look into", "deep dive on", "find out about" a topic
User needs current information before a decision (tool choice, framework selection, product comparison)
User asks "what's the current state of X" or "is Y still relevant"
User asks to compare 3+ things or survey an unfamiliar space

Useful for:

Pre-implementation research (libraries, patterns, trade-offs)
Comparative analysis (Tool A vs Tool B vs Tool C)
Catching up on a fast-moving space
Sanity-checking assumptions against authoritative sources

Avoid when:

Question can be answered from the codebase (use Read/Grep first; that's Phase 0 anyway)
User wants a quick fact, not a synthesis (single search is enough)
Topic is internal to the user's project (no public sources will exist)
User has already done the research and just wants implementation help

Quick-Answer-Before-Research

For opinion-shaped or judgment questions ("should I use X?", "is Y any good?", "why is Z popular?") where the model already has a confident grounded answer:

Give a brief direct answer first (2-4 sentences)
Offer to escalate: "Want me to run full research on this?"
Only escalate to the full pipeline on user pushback ("you sure?") or explicit ask ("yes, research it")

This prevents launching a 10-search investigation when the user just wanted a confident take.

Process

Phase 0: Local-first Check

Before any web search, you MUST:

Check the working directory for existing material on this topic — rg "<topic>" . (case-insensitive), and Read any obvious matches
If existing local material covers the question, default to UPDATE not CREATE — surface what exists and ask whether to extend it
State what you found (or "no local hits") in chat before moving to Phase 1

This step is mandatory even for purely web-shaped topics — it costs little and prevents duplicating existing knowledge.

Phase 1: Plan

Before searching, you MUST post:

The interpretation of the topic ("Interpreted as: …") — especially if ambiguous
The 3-5 search angles you'll use, drawn from the Five Search Angles table
The mode you're operating in (quick / default / comparison / landscape)

Wait one beat after posting. Don't start searching until the plan is in chat.

Phase 2: Disambiguate (when needed)

Before deep search, you MUST:

If the subject is a proper noun, acronym, product name, or otherwise ambiguous, run one broad search to fix the referent
State the resolved subject explicitly ("Confirmed: X = the Y from Z")
If still ambiguous, ask the user to disambiguate before continuing

Skip this phase when the subject is unambiguous (a well-known concept, a clear technical term).

Phase 3: Execute Searches

Hard floor: 5+ web searches minimum for default and above. Less than that is not deep research.

Query strategy:

Start broad ("what is X", "X overview")
Narrow into specifics ("X best practices", "X performance characteristics")
Look for tension ("X criticism", "problems with X", "X vs alternatives")
Look for currency (include the year in queries when relevant)
Use WebFetch to read full articles when search snippets are insufficient — every source you cite substantively should be fetched, not skimmed

Source quality priorities (CRAAP-ish):

Official docs, specifications, primary sources, peer-reviewed papers
Reputable industry publications and recent technical blogs
Comparative analyses from credible authors
Community discussion (treat as signal, not authority)

Prefer primary > secondary > tertiary. SEO-optimized content farms outrank authoritative PDFs in search results — actively look past them.

Cross-reference rule: If two sources disagree, note the disagreement explicitly in the output. Do not silently pick one.

Phase 4: Sufficiency Check (Reflection Gate)

Before synthesizing, you MUST post a one-line check to chat:

Searches run: N | angles covered: | full reads (WebFetch): M
Material for each planned section: yes/no
Open questions / gaps identified: <list, or "none non-trivial">
Reflection: "what would change my conclusion?" — answer in one sentence

If a planned section has no material, either run more searches or remove the section. Do not pad with speculation.

Phase 5: Synthesize and Report

Use the synthesis template below. Adapt section names to fit the topic — technical, product, conceptual.

# [Topic] — Research Summary

> Interpreted as: [if ambiguity was present]

## Tl;dr
[2-3 sentences directly answering the original question.]
*Confidence: high/medium/low — [why]*

## Overview
Definition, why this matters, key concepts.
*Confidence: ...*

## Core Information
How it works, main features, technical details.
*Confidence: ...*

## Comparison Matrix  [REQUIRED when comparing 3+ items]
| Item | Dimension 1 | Dimension 2 | Dimension 3 | Notes |
|------|-------------|-------------|-------------|-------|
| ...  | ...         | ...         | ...         | ...   |

## Practical Details
Tools, libraries, products in this space. Best practices. Common pitfalls.
*Confidence: ...*

## Trade-offs and Criticism
Limitations, valid criticisms, when not to use this.
*Confidence: ...*

## Current Landscape
Recent developments, adoption signal, where things are heading.
Note any source disagreements explicitly.
*Confidence: ...*

## What we still don't know  [include when gaps are non-trivial]
- Open question 1
- Open question 2

## Sources

### Official Documentation  [group when sources >5]
- [Title (YYYY-MM-DD)](url) — what this contributed

### Comparative Analyses
- [Title (YYYY-MM-DD)](url) — what this contributed

### Community Guides
- ...

Source list rules:

Every claim in the summary should be traceable to a listed source
Include 8-15+ sources for a substantive topic
Annotate each source briefly (one phrase: what it contributed)
Include the source's publication date in the link text when available — (YYYY-MM-DD) or (Mar 2026)
When sources >5, group under sub-headers (Official Documentation / Comparative Analyses / Community Guides / etc.)
Mix official, industry, and community sources

Phase 6: Cite-Verify

Before delivering, you MUST:

Walk every non-trivial claim and verify it traces to a fetched source
Confirm every cited URL was actually opened in this run (no plausible-sounding fabrications)
Restate the original question silently and check that every section serves it

Modes — Detail

`quick`

Triggered by: "quick research on X", "give me a fast read on Y", small/low-stakes asks.

3-5 searches, 1-2 angles
Output: one paragraph + 3-5 annotated sources
Skip the full template; keep Phase 0, 1, 4, 6 gates lighter

`comparison`

Triggered by: "compare X and Y", "X vs Y vs Z", "pros and cons of X".

10-15 searches, 5+ angles
Comparison Matrix is required at top of report (right after Tl;dr)
Cover each item across the same dimensions; do not give one item more dimensions than another

`landscape`

Triggered by: "survey the landscape of X", "what's the current state of the Y space", broad/unfamiliar domains.

Use parallel research subagents (see Multi-Agent Sub-Mode below)
Synthesize their findings into a single consensus report
Surface disagreements between subagent findings as first-class output

Multi-Agent Sub-Mode

For landscape mode, or when a topic genuinely spans multiple distinct sub-domains:

Decompose the topic into 3-5 angles or sub-topics
Dispatch one research subagent per angle, in parallel — each gets its own brief, search budget, and angle assignment
Each subagent returns structured findings + sources
The orchestrator synthesizes a consensus report:
- What all subagents agreed on (high confidence)
- Where they diverged (call out explicitly)
- What no subagent found (gaps)

This pattern outperforms a single-agent run for breadth-heavy topics but uses substantially more tokens — reserve for landscape-scope work.

Save-as-Note Handoff

When the user explicitly asks to save the research as a note (after the in-chat report), use this generic template. Ask for the path before writing.

---
title: <Topic>
created_at: YYYY-MM-DD
source: web-research
tags: [topic-tag, area-tag]
---

# <Topic>

[Same body as the in-chat report — Tl;dr, Overview, Core Information, etc.]

## Sources
[Grouped + annotated source list, with publication dates]

Adapt the frontmatter to the user's project conventions if they have an established schema — read one or two existing notes in the target directory first to match the style.

Quality Signals

A good deep-research output has:

Plan posted before searching — interpretation, angles, mode all stated upfront
Local-first check completed — even when the answer was unambiguously a web question
5-10+ searches with diverse angles — covering at least 3 of the Five Search Angles
Cross-referenced claims — disagreements between sources surfaced, not hidden
Current information prioritized — recent dates appear in queries and citations
Every claim is sourced — no orphan assertions, no fabricated URLs
Trade-offs section is real — limitations and criticisms appear, not just the rosy view
Confidence labels per section — reader knows what's solid vs thin
Source list is grouped + annotated — each entry says what it contributed, with publication date when available
Section names fit the topic — technical topics get technical sections, product topics get matrices

Failure Modes

Defenses for the well-documented failure modes of LLM web research:

Sycophancy

Don't change a sourced claim under user pushback without new evidence. "I might be wrong" is fine; flipping the answer because the user disagreed is not. If the user pushes back, the next move is another search, not capitulation.

Anchoring

Before synthesizing, ask: "what would change my conclusion?" If recent searches gave evidence that contradicts the early framing, update — don't paper over it. The reflection step in Phase 4 is the place to catch this.

Source laundering / SEO bias

Prefer primary > secondary > tertiary. For technical claims prefer official docs, .gov, .edu, peer-reviewed papers. Top search hits are not the same as authoritative sources — actively look past content farms to find the primary source.

Fabricated citations

Every cited URL must have been actually fetched in this run. If you didn't open it, don't cite it. Plausible-sounding URLs that don't exist are the dominant hallucination class in long-form research.

Drift

Restate the original question at synthesis time and check every section serves it. Multi-agent runs are especially prone to drifting toward the majority subagent view and away from the user's actual ask.

Troubleshooting

Problem: Sources disagree and I'm not sure which is right

Solution: Surface the disagreement explicitly. Cite both, note the date and authority of each, and let the user judge. "Source A (official docs, 2026) says X. Source B (popular blog, 2024) says Y. The discrepancy may be due to a recent API change."

Problem: Topic is too obscure — searches return little

Solution: Try alternative phrasings and adjacent terms. Search for the broader category and narrow from there. If sources are genuinely thin, say so. Mark sections "Limited sources available." Do not pad with speculation.

Problem: Output is becoming too long

Solution: Default to a single, well-structured summary. If the topic genuinely needs more depth, offer the user a choice: "I have material for a deeper section on X — want me to expand?" Don't dump everything by default.

Problem: Research keeps drifting from the user's actual question

Solution: Re-anchor. Restate the original ask at the top of the summary and check every section serves it. Cut sections that don't. The Cite-Verify gate (Phase 6) is the last chance to catch this.

Problem: Tempted to call it done after 1-2 searches

Solution: That's not deep research — that's a search snippet. The 5+ floor exists because single-search answers consistently miss criticism, currency, and counter-examples. If the user genuinely wants a one-shot answer, they should ask for a quick fact, not invoke this skill.

Problem: A cited URL doesn't actually exist or returns 404

Solution: Stop. Walk every cited URL against your tool history — every citation must trace to an actual WebSearch hit or WebFetch read. Drop fabricated URLs entirely; do not "fix" them by guessing a working URL.

Problem: User asks to save the research to a file

Solution: Use the Save-as-Note Handoff template. Ask for the target path. Don't auto-save without being asked.

Examples

Example 1: Technical Topic (default mode)

assistant: [P0] No local material found. [P1] Interpreted as: htmx for a side project. Mode: default. Angles: Official, Comparative, Criticism, Currency, Community. [P3: 8 searches; WebFetch on htmx.org, two recent comparison posts, HN thread.] [P4: 8 searches, 5 angles, 4 full reads. Gap: htmx + auth flows thin.]

htmx — Research Summary

Tl;dr

htmx fits server-rendered side projects with light-to-moderate interactivity. Poor fit for stateful client-rich UX. Confidence: high — official docs + three recent analyses converge.

Overview

Hypermedia-driven library extending HTML with AJAX/SSE/WS attributes...

Trade-offs

[Real limitations from sources]
Sentiment splits "elegant return to basics" vs "not a SPA replacement" — note the disagreement

What we still don't know

Production-scale auth-flow patterns are under-documented
Long-term maintenance signal post-1.x is thin

Sources

Official Documentation

htmx.org Documentation (2026) — official reference
htmx essays (2025) — author's design rationale

Comparative Analyses

htmx vs React (Mar 2026) — recent benchmarks

Community Guides

HN discussion thread (Jan 2026) — sentiment + counter-examples


**Why this is good:** Local-first checked, plan posted, 5+ angles, sufficiency check posted, Tl;dr leads, confidence labels per section, gaps section honest, sources grouped + annotated + dated, every claim traces to a cited source.
</Good>

<Bad>

user: research htmx for me

assistant: [Runs one search "htmx"]

htmx is a JavaScript library that lets you do AJAX with HTML attributes. It's lightweight and an alternative to React. Some people like it.

Sources: htmx.org


**Why this is bad:** No Phase 0, no plan, single search, no cross-referencing, no Tl;dr, no confidence, vague unsourced claims, one source.
</Bad>

### Example 2: Comparison Mode

<Good>

user: should I use Postgres or SQLite for a small SaaS?

assistant: [P0] No local notes. [P1] Interpreted as: small SaaS = single-region, modest scale. Mode: comparison. All five angles. Matrix required. [P3: 12 searches; WebFetch on official docs, Litestream/LiteFS, migration writeups.]

Postgres vs SQLite for Small SaaS — Research Summary

Tl;dr

SQLite + Litestream is a viable default for single-region small SaaS unless concurrent writers exceed ~1k/sec. Postgres remains the safe choice once you anticipate horizontal scaling or multi-region. Confidence: high.

Comparison Matrix

Dimension	SQLite (+ Litestream)	Postgres
Operational cost	Near-zero	Managed service or ops
Write concurrency	Limited (WAL helps)	High
Replication	Litestream / LiteFS	Native streaming
Migration cost	Schema-flexible	Strict, mature tooling
Backup story	File-level	pg_dump + WAL

When Each Fits

[Concrete criteria with sources, per side]

Current Landscape

Litestream and LiteFS reset the conversation in 2024-2026...

Sources

[8+ grouped, dated, annotated]


**Why this is good:** Frames around the user's actual context, mandatory matrix at the top, covers recent developments that change the answer, confidence labels, addresses migration cost (the real decision risk).
</Good>

<Bad>

user: should I use Postgres or SQLite

assistant: Postgres is more powerful, SQLite is simpler. Use Postgres for production, SQLite for prototypes.

Sources: Wikipedia


**Why this is bad:** No matrix in comparison mode, ignores user's context, oversimplified, one weak source.
</Bad>

---

## Integration

**This skill works with:**
- **track-session** — For long research projects, save the topic plan and source list to SESSION_PROGRESS.md so it's resumable
- **track-roadmap** — Use research to inform roadmap decisions; cite the research summary in the roadmap entry

**Pairs naturally with:**
- Pre-implementation planning — research first, then design
- Tool/framework selection — research the candidates before recommending
- Catching up on a fast-moving space before contributing to it
- Parallel-agent dispatch (`landscape` mode) when a topic genuinely spans multiple domains

**Does not replace:**
- Reading the codebase (Phase 0 is the codebase pass)
- Asking the user (use AskUserQuestion when you need their preference, not facts)
- Architecture or design skills (research informs them; it doesn't make decisions)

deep-research

Deep Research

Overview

The Five Search Angles

Pipeline

Modes

Usage

When to Use

Quick-Answer-Before-Research

Process

Phase 0: Local-first Check

Phase 1: Plan

Phase 2: Disambiguate (when needed)

Phase 3: Execute Searches

Phase 4: Sufficiency Check (Reflection Gate)

Phase 5: Synthesize and Report

Phase 6: Cite-Verify

Modes — Detail

quick

comparison

landscape

Multi-Agent Sub-Mode

Save-as-Note Handoff

Quality Signals

Failure Modes

Sycophancy

Anchoring

Source laundering / SEO bias

Fabricated citations

Drift

Troubleshooting

Problem: Sources disagree and I'm not sure which is right

Problem: Topic is too obscure — searches return little

Problem: Output is becoming too long

Problem: Research keeps drifting from the user's actual question

Problem: Tempted to call it done after 1-2 searches

Problem: A cited URL doesn't actually exist or returns 404

Problem: User asks to save the research to a file

Examples

Example 1: Technical Topic (default mode)

htmx — Research Summary

Tl;dr

Overview

Trade-offs

What we still don't know

Sources

Official Documentation

Comparative Analyses

Community Guides

Postgres vs SQLite for Small SaaS — Research Summary

Tl;dr

Comparison Matrix

When Each Fits

Current Landscape

Sources

More from antjanus/skillbox

generate-skill

setup-semantic-release

track-roadmap

record-tui

reflect

remember

`quick`

`comparison`

`landscape`