research

Installation

SKILL.md

Technical Research

Conduct evidence-driven technical research. Default output is a Formal Report (Path A) — a persistent artifact with evidence files. Other paths require explicit user signals:

Path A — Formal Report (DEFAULT): Persistent artifact in the resolved reports directory with evidence files. This is the default unless the user explicitly opts out.
Path B — Direct Answer: Findings delivered in conversation only. Requires explicit user request (e.g., "just tell me", "no report needed", "quick answer").
Path C — Update Existing Report: Surgical additions/corrections to an existing report. Triggered when the user references an existing report or says "update/refresh/extend."

Autonomy Mode

Research supports two execution modes:

Mode	Behavior	How entered
Supervised (default)	Pause at scoping gate for user rubric confirmation; present routing options interactively	Default when invoked by a user in an interactive session
Headless	Auto-confirm rubric after proposing it; auto-select routing decisions; skip interactive follow-up prompts. All other quality gates (worldmodel + routing, validation, evidence standards) remain enforced.	`$ARGUMENTS` includes `--headless`, OR container environment detected (`/.dockerenv` exists or `CONTAINER=true`), OR invoked via `-p` non-interactive mode
`--fanout`	At Step 3 or Step 6, use nested fanout whenever the routing heuristic says it's warranted. At Step 3: if rubric has 5+ P0/Deep independent dimensions, fanout replaces standard research. At Step 6: auto-select all follow-ups assessed as "heavy." Lighter items handled by subagents. Requires `--headless`.	`$ARGUMENTS` includes `--fanout` (must also include `--headless`)

Headless mode adjustments:

Worldmodel + Routing: Still mandatory. Worldmodel scans for existing research via its reports channel. But instead of presenting options to the user, auto-select: fully covered → proceed to new report (assume the caller wants fresh research on this specific angle), partially covered → start new report, not covered → start new report.
Scoping (Step 1): Propose the rubric AND proceed immediately — do not stop and wait for confirmation. The rubric is derived from the prompt/arguments provided. If the prompt includes explicit dimensions or questions, use those as the rubric.
Step 6 (Recap + Follow-up): Write the recap into the report or output. Skip the interactive "where we could go from here" prompt — there is no user to respond.
--fanout (Step 3): After scoping, assess rubric dimensions using the routing heuristic in Step 3.0. If 5+ P0/Deep independent dimensions, use nested fanout mode (replaces Steps 3-4). Otherwise, use standard deep research mode.
--fanout (Step 6): After the recap, assess all follow-up directions using the routing heuristic in Step 6.3. Auto-select those assessed as "heavy" (3+ facets, multi-source) for nested fanout. Handle "light" follow-ups via subagents. If zero follow-ups qualify as "heavy," complete normally without fanout.
Tasks: Still created for structural enforcement and progress tracking, but no blocking on user input.

The --headless flag is the standard mechanism for orchestrating skills (e.g., /nest-claude, /ship) to signal "you're running non-interactively." It follows the same convention as --delegated in /debug and /qa.

Mandatory Execution Order

When this skill is invoked, execute these steps in order. Steps marked ⛔ are hard gates — you MUST complete each one before proceeding to the next. Do NOT skip ahead.

Step 0: Create workflow checkpoint tasks — Immediately create tasks for each step below (see Step 0 section). This is always the first action.
⛔ Worldmodel + Routing — Run /worldmodel --depth light to discover the topic landscape AND check existing research (worldmodel's Prior Research section replaces the catalogue scan). Route based on coverage. (Light depth is sufficient — research provides its own depth in Steps 3-4. Pass --depth full only when the topic is a deep 1P codebase investigation and the user explicitly needs comprehensive code tracing.)
⛔ Step 1: Collaborative Scoping — Propose a research rubric. STOP and WAIT for user confirmation. Do NOT proceed to research until the user explicitly confirms the rubric.
Step 2: Create Report Directory — Set up the report structure (Path A only).
Step 3: Research + Evidence Capture — Conduct research and capture evidence.
Step 4: Write REPORT.md — Synthesize findings (Path A only).
Step 5: Validate — Run the validation checklist. 7b. Step 5b: Audit — Run /audit on the completed report (skip for Path B and simple reports).
Step 6: Recap + Follow-up — Present findings and offer follow-up directions.

Path B shortcut: If the user explicitly requests a direct answer (Step 1 determines this), skip Steps 2, 4, 5, 5b — but Worldmodel + Routing and Scoping are still mandatory.

Reports directory

Reports are stored in a configurable directory. Resolution priority:

Priority	Source	Example
1	User says so in the current session	"Put the report in `docs/research/`"
2	Env var `CLAUDE_REPORTS_DIR` (pre-resolved by SessionStart hook — check `resolved-reports-dir` in your context)	`CLAUDE_REPORTS_DIR=./my-reports` → `./my-reports/<report-name>/REPORT.md`
3	AI repo config (`CLAUDE.md`, `AGENTS.md`, `.cursor/rules/`, etc.) declares a reports directory	`reports-dir: .ai-dev/reports`
4	Default (in a repo)	`<repo-root>/reports/<report-name>/REPORT.md`
5	Default (no repo)	`~/reports/<report-name>/REPORT.md`

Resolution rules:

If CLAUDE_REPORTS_DIR is set, treat it as the parent directory (create <report-name>/REPORT.md inside it).
Relative paths resolve from the repo root (or cwd if no repo).
When inside a git repo, reports default to the repo-local reports/ directory. When not inside a git repo, fall back to ~/reports/.
Do not scan for existing directories automatically — only use them when explicitly configured via one of the sources above.
When in doubt, use the default and tell the user where the report landed.
Worldmodel.s reports channel should check both project-level and user-level catalogues when scanning for existing research.

Throughout this skill, <reports-dir> refers to the resolved reports directory. The --reports-dir flag on catalogue/normalize scripts also overrides all of the above.

When to Use This Skill

Condition	Use This Skill	Output Format
Investigating a technology for adoption	✅ Yes	Report or Direct
Comparing two or more systems/approaches	✅ Yes	Report or Direct
Documenting architecture from codebase analysis	✅ Yes	Report (persistent)
Gathering context for system design decisions	✅ Yes	Usually Direct
Research that another agent will consume	✅ Yes	Report (persistent)
Updating / extending an existing report	✅ Yes	Report update (surgical) or Direct
Quick technical question (< 5 min research)	⚠️ Maybe	Direct (no skill needed)
Creating procedural "how to do X" guidance	❌ No → Use a skill	—
Defining an operational role/persona	❌ No → Use a subagent	—
Setting always-on project constraints	❌ No → Use CLAUDE.md	—

When to produce a formal report:

Findings need to be referenced later
Multiple agents or people will consume the research
Evidence trail is important for audit/verification
Research is complex enough to warrant structured documentation

When to deliver findings directly:

User just needs an answer now
Research is for immediate decision-making
Findings won't be referenced again
Speed matters more than persistence

When to update an existing report:

User says "update/refresh/extend/add to the report" or references an existing <reports-dir>/<report-name>/
New questions/dimensions need to be added without redoing everything
Corrections are needed (new evidence changes prior findings)

Report Framing Default: External / Third-Party Sources

Research reports default to 3P/external framing — investigating third-party topics, technologies, concepts, authorities, repos, and public sources. Reports should NOT include analysis of the user's own codebase, internal systems, or company-specific content unless the user explicitly asks for it.

Why: When agents mix first-party (1P) codebase analysis into research reports, the findings drift from factual synthesis toward opinion-forming applied to the company. This reduces factual fidelity — the agent starts trying to "help decide" instead of providing the factual landscape. Research reports should be a body of factual external knowledge that a downstream agent or reader can use to make their own judgment calls.

In practice:

Default: Research external sources (web, OSS repos, public docs, papers, official APIs). Do not investigate or reference the user's own codebase/repos in the report.
Exception: If the user explicitly requests 1P analysis (e.g., "research how our auth system compares to X", "include analysis of our codebase"), include it — but clearly separate 1P observations from 3P findings in the report so the reader can distinguish externally-verifiable facts from company-specific assessments.
Source code research (references/source-code-research.md) applies to both 1P and 3P repos — the investigation methodology is identical. The difference is whether 1P findings belong in the report (3P: yes by default; 1P: only when explicitly requested).

This framing complements the existing follow-up constraint (Step 6.2): follow-ups must be standalone research directions investigable via external sources, not actions on the user's own assets.

Step 0: Create Workflow Checkpoint Tasks

⛔ ALWAYS THE FIRST ACTION. Before doing anything else, create workflow checkpoint tasks. These provide structural enforcement, persist across context compaction, and are visible to the user.

Create these tasks immediately upon invocation:

TaskCreate: "Research: Worldmodel + Routing — landscape + existing research" → start as in_progress
TaskCreate: "Research: Scoping — propose rubric + get confirmation"          → pending, blocked by #1
TaskCreate: "Research: Conduct research + capture evidence"                  → pending, blocked by #2
TaskCreate: "Research: Write REPORT.md"                                      → pending, blocked by #3
TaskCreate: "Research: Validate"                                             → pending, blocked by #4
TaskCreate: "Research: Audit — quality verification via /audit"              → pending, blocked by #5
TaskCreate: "Research: Recap + follow-up"                                    → pending, blocked by #6

Use addBlockedBy to enforce ordering. As you complete each step, mark its task completed and mark the next task in_progress.

Path B variant: If scoping determines Path B (direct answer), mark tasks #4, #5, and #6 as deleted (they don't apply).

Path C variant: If worldmodel + routing sends you to Path C (update existing), mark tasks #2-#7 as deleted and create Path C-specific tasks per references/updating-existing-reports.md.

Why tasks? The observed failure mode is the agent skipping worldmodel + routing and scoping, jumping straight to web searches. Tasks provide a persistent, user-visible structural enforcement layer that survives context compaction and makes skipped steps immediately obvious.

Worldmodel + Routing

⛔ MANDATORY FIRST STEP. Before any web searches, before any research, before any analysis — complete this step. If you find yourself about to run a web search or read a codebase directly, STOP — you have skipped this step.

Phase 1: Check existing knowledge

Before scoping new research, scan what already exists.

If the user explicitly references an existing report (names it, links it, or says "update/refresh/extend"): → Skip the scan. Go directly to Path C — load references/updating-existing-reports.md.

Otherwise, always check the catalogue first:

Regenerate the catalogue (fast — takes seconds):

bun ~/.claude/skills/research/scripts/generate-catalogue.ts

Read <reports-dir>/CATALOGUE.md. This is a structured index of all reports with title, description, topics, subjects, evidence count, and last-updated date.
Scan the summary table and detail cards for semantic overlap with the user's topic — match on title, description, topics, and subjects. You are looking for conceptual relevance, not just keyword matches.
For the 1–3 most promising candidates, read the REPORT.md Executive Summary and Research Rubric sections to assess actual coverage depth and relevance.

Classify the user's topic against existing reports:

Coverage	What it means	Example
Fully covered	An existing report directly answers the user's question with evidence	User asks about MCP connectivity patterns; `mcp-connectivity-provider/` report exists with that dimension covered
Partially covered	An existing report covers related ground but not the specific question, or the question is a natural extension	User asks about MCP auth; `mcp-connectivity-provider/` covers architecture but not auth specifically
Not covered	No existing report has meaningful overlap	User asks about container orchestration; nothing relevant exists

Phase 2: Route based on coverage

Fully covered → Present what's already known before proposing new work:

"We already have research on this. Here's what the existing report found:

[2–4 key findings from the report, with confidence levels]

Options:

Use as-is — this answers your question. I can elaborate on any finding.

Verify / refresh — the report is from [date]. Want me to spot-check whether findings still hold?

Go deeper — the existing report covers [scope]. Want to add [specific dimension] or investigate [angle not covered]?

New angle entirely — if your question is actually about [different framing], a new report may be cleaner."

Let the user choose. Do not start new research when existing research already answers the question.

Headless mode (fully covered): Note the existing report's coverage in the routing log, then proceed to a new report — the caller is requesting fresh research on a specific angle even if prior work exists.

Partially covered → Surface the overlap and propose how to build on it:

"We have related research in <reports-dir>/<name>/ that covers [what it covers]. Your question about [topic] isn't directly answered, but it's a natural extension.

Options:

Extend the existing report (Path C) — add [new dimension/facet] to <name>/. Makes sense if the topics are coherent together.

Start a new report (Path A) — if this is a distinct enough topic that it deserves its own report.

I'd recommend [1 or 2] because [reason]."

Headless mode (partially covered): Start a new report (Path A). Do not prompt for extend vs. new.

Not covered → Proceed to Path A (default) or Path B:

Default to Path A (formal report, Steps 1–6) unless the user explicitly asks for a quick answer, says "just tell me," or the question is clearly trivial (< 5 min research).

If the user signals they want a direct answer → Path B (lighter scoping, no evidence files unless complex)

Routing principles

Do not skip the scan. Even a 30-second skim prevents duplicate research and gives the user valuable context on what's already known.
Bias toward extending when topics are semantically coherent. One comprehensive report is more useful than two overlapping ones.
Bias toward new reports when the framing, audience, or primary question differs materially — even if some evidence overlaps.
Do not downgrade to Path B without a clear signal. Persistent, evidence-backed reports are the default.

⚠️ Avoid: Starting a new report on a topic that's already well-covered. The user may not know what prior research exists — surfacing it is part of the value.

Success criteria: (1) Existing knowledge surfaced before new work begins, (2) Rubric confirmed before research starts, (3) Every finding links to evidence, (4) Output format matches user intent.

Using worldmodel output for scoping

The worldmodel output is a topology map (surfaces, connections, entities, patterns) — not pre-built dimensions. Research builds its own rubric dimensions from this topology:

Entities & Terminology → concept inventory for the rubric; reveals what the domain's vocabulary is
Surfaces → system topology; helps identify which areas warrant investigation dimensions
Patterns Observed → convergences/divergences across channels; divergences often point to high-value investigation dimensions
Prior Research → existing coverage; helps scope what's already answered vs what needs new work
3P Landscape → OSS source identification for code-first research
Connections & Dependencies → blast radius and propagation chains; useful for scoping investigation boundaries
Unresolved/Adjacent → potential dimensions or facets the topology couldn't resolve at survey depth

See references/scoping-protocol.md §2A for how the worldmodel topology feeds into rubric construction.

Step 1: Collaborative Scoping (for Path A/B)

⛔ HARD GATE (Supervised mode). Do NOT start any research (web searches, code analysis, evidence gathering) until the user explicitly confirms the rubric. After proposing the rubric, STOP and WAIT for user response. Mark the Scoping task as completed only after receiving user confirmation.

Headless mode: Worldmodel already ran in the previous step (Worldmodel + Routing). Use its output — Entities & Terminology for vocabulary and concept inventory, Surfaces for system topology, Patterns for convergences/divergences, Prior Research for existing coverage, 3P Landscape for OSS sources (see scoping-protocol.md §2A). Propose the rubric and proceed immediately without waiting for confirmation. If the prompt includes explicit dimensions, worldmodel ran in context-aware mode (supplements the user's dimensions with discovered gaps). Mark the Scoping task as completed after proposing.

Load: references/scoping-protocol.md

For Formal Report (Path A): Output a complete research rubric and get explicit user confirmation before proceeding (Supervised) or auto-confirm (Headless).
For Direct Answer (Path B): Scoping is still required; keep it appropriately sized, but make dimensions/stance explicit and confirm (Supervised) or auto-confirm (Headless).

If the user is updating an existing report, skip this step. Use Path C and load references/updating-existing-reports.md.

Step 2: Create the Report Directory

Mark the "Conduct research" task as in_progress after completing this step.

mkdir -p <reports-dir>/<report-name>/evidence
mkdir -p <reports-dir>/<report-name>/meta

Load: references/report-directory-conventions.md for naming rules, directory structure, and frontmatter schema.

Naming: <scope>-<aspect>, kebab-case, max ~5 segments. E.g., claude-skills-architecture, devops-practices, openhands-vs-openclaw.

2.0 Persist the worldmodel snapshot

Write the worldmodel output (captured during the Worldmodel + Routing step) to <reports-dir>/<report-name>/meta/worldmodel.md. This makes the topology readable by subagents (Step 3.2) and nested fanout instances (Step 6 / Step 3 fanout) without each one re-running /worldmodel.

The file is the canonical landscape snapshot for this report. On Path C update passes that re-run worldmodel, overwrite this file and record the change in meta/_changelog.md.

Skip this for Path B (no report directory) — worldmodel stays in conversation context only.

2.1 Run-scoped coordination (create only when needed)

Coordination artifacts exist to reduce redundancy and manage parallel work. They are process, not proof.

Core rule: When you do a coordinated research pass (especially with subagents), treat it as a run with a single run context file:

meta/runs/<run-id>/RUN.md

This prevents stale coordination context from one pass bleeding into the next.

When to create a run

Deep research mode (using subagents): Create a run by default.
Solo mode: Create a run only when at least one of these is true:
- Multi-session work: you expect follow-on updates and want durable pass context
- High-stakes verification: you will run a verification pass or need an audit trail for coordination decisions
- Large rubric / explicit gap-closure: many P0 facets and coverage tracking would otherwise be error-prone

Placement (proof vs process separation)

evidence/ is proof only — primary-source snippets, citations, negative searches
Run coordination lives in: <reports-dir>/<report-name>/meta/runs/<run-id>/RUN.md

Create directories only when needed:

mkdir -p <reports-dir>/<report-name>/meta/runs/<run-id>

Run ID convention

Format: YYYY-MM-DD-<short-label>

Examples:

2026-02-02-initial
2026-02-03-add-sso
2026-02-04-corrective-mfa-fix

RUN.md ownership + lifecycle

Ownership: Only the parent/orchestrator writes RUN.md.
Workers: Read RUN.md at task start; return findings via their responses; do not write files to the run folder.
Lifecycle:
1. Run start: create RUN.md with Status: Active
2. During run: update RUN.md as needed for coordination (owners, anchors, delta rubric)
3. Run close: set Status: Closed and treat RUN.md as immutable

Coverage tracking (via tasks, not files)

Do not create a persistent _coverage.md.

Instead, the parent/orchestrator:

creates a task for each P0 dimension (or P0 facet cluster) at run start
marks the task complete when coverage is confirmed (evidence captured + conflicts resolved)
reviews task status during gap analysis to identify missing coverage

Use natural "task" terminology; do not assume any specific task tool.

What stays at the meta/ level (cross-run)

Some artifacts are not run-scoped:

Artifact	Purpose	Notes
`meta/worldmodel.md`	Current-state landscape snapshot (topology, entities, 3P)	Written at Step 2.0 for Path A. Re-overwritten on Path C update passes; logged in `_changelog.md`.
`meta/_changelog.md`	Append-only history	Parent/orchestrator-owned only

meta/ is created at Step 2 (for Path A). For Path B (no report directory), it is not needed.

What not to create (new model)

Do not create these as part of the current coordination model:

meta/_shared-context.md (merge its contents into RUN.md)
meta/_coverage.md (use tasks instead)

2.2 Evidence standards and reusing existing context

Reports must be standalone. Evidence must rely on 1st-party sources (source code, official docs, API references, research papers, direct observations) — not on other reports' conclusions. A reader should be able to verify every claim in the report without consulting another report.

Evidence can overlap across reports. Two reports may cite the same 1st-party source independently. This is expected — each report contextualizes the evidence for its own scope and primary question. Do not avoid citing a source just because another report already cited it.

Cross-references to other reports are allowed as navigation aids only. When a dimension in report A is covered in more depth by a different report B, and repeating that depth wouldn't fit naturally in report A's scope, you may include a "Related Research" pointer. These are "see also" links for the reader's benefit — not evidence citations. The current report must not depend on the cross-referenced report for any of its claims.

If prior research exists on similar topics in the resolved reports directory (during the research process):

Extract only what is still relevant to the rubric.
Treat prior reports as secondary unless you can re-verify key claims.
Do not copy claims forward without evidence or a staleness caveat.

Step 3: Conduct Research + Capture Evidence

Mark the "Conduct research" task as in_progress (if not already). Mark it completed when all P0 dimensions have evidence.

Load: references/citation-formats.md for evidence substantiation standards Load: references/web-search-guidance.md for web source quality standards (applies to all research) Load: references/source-code-research.md if source code is available (OSS repos, user-provided repos, or CWD)

3.0 Choose research execution mode

Condition	Mode	What to do
Small scope (≤2 dimensions), no parallelism needed	Solo mode	Work through rubric dimension-by-dimension
Moderate scope (3-5 dimensions), shared sources	Deep research mode	Load orchestration reference
Many dimensions, shared sources likely	Deep research mode	Load orchestration reference
Large scope (5+ P0/Deep, independent dimensions)	Nested fanout mode	Load `references/nested-fanout.md`

Routing heuristic (for choosing between deep research and nested fanout):

Assess the rubric dimensions on two axes:

Facet count per dimension: How many independent sub-questions? (1-2 = light, 3+ = heavy)
Source diversity per dimension: Same codebase/domain, or multiple external repos/ecosystems?

Rubric shape	Mode
Most dimensions are light (1-2 facets, single source)	Deep research mode (subagents)
5+ dimensions are heavy (3+ facets, multi-source) and independent	Nested fanout mode

When uncertain, bias toward fanout — the cost of over-fanouting (thin sub-reports, wasted tokens) is lower than under-fanouting (shallow coverage of deep topics).

State what you're doing and why before proceeding. Not as a question — as a transparent assessment the user can redirect if they disagree.

When nested fanout mode is chosen at Step 3, it replaces Steps 3 AND 4 — the fanout sub-instances each produce their own research, and the consolidation produces the final REPORT.md. Proceed to Step 5 (Validate) after consolidation completes.

3.1 Solo mode (no subagents)

Work through the rubric dimension-by-dimension:

Research (OSS: code-first; closed: web-first; partial: hybrid).
Web search checkpoint (see references/web-search-guidance.md):
- P0: Always check open issues, official docs, CVEs, maintenance signals
- P1: Check dimension-relevant categories (see dimension-aware table)
- Prioritize T1/T2 sources; cross-reference T3/T4
Capture evidence to evidence/<dimension>.md as you find it.
Track gaps (especially P0 facets).

3.2 Deep research mode (using subagents)

Load: references/subagent-orchestration.md

Subtree-aware dispatch

Subagents and /research --headless fanout instances are children — project hooks (UserPromptSubmit, PreToolUse) DO NOT fire for them. Any subtree-context injection that the orchestrator session normally enjoys is invisible to the child. Inline subtree rules in the dispatch prompt instead.

Before dispatching workers or fanout sub-instances:

If the research target lives under a nested subtree (look for nested package.json + lockfile + AGENTS.md), state plainly: "Code lives in <subtree>. cd there before running any build/install/test command. Read <subtree>/AGENTS.md first; its conventions override root." This applies to source-code research (references/source-code-research.md) — the worker may need to grep, run scripts, or check lockfiles inside the subtree.
Tell the worker which package manager governs the subtree (bun.lock → Bun, pnpm-lock.yaml → pnpm). Don't let it default to npm.
If the worker is reading multiple repos or subtrees, label each path with its owning subtree so the worker doesn't apply one subtree's rules to another's files.

This mode uses a 5-phase pattern to reduce redundant discovery, control verbosity, and produce primary-source-grounded evidence:

Foundation pass — establish shared context for this run (minimal if overlap risk is low)
Parallel subagent work — dispatch 4–6 workers with strict Markdown output contracts
Gap analysis checkpoint — verify P0 coverage before writing evidence
Evidence writing — orchestrator creates evidence files from worker findings + primary sources
Targeted verification (conditional) — spot-check only when high-stakes, weak evidence, or conflicts

Run coordination surface (recommended)

For each coordinated pass, create a run folder and run file:

<reports-dir>/<report-name>/meta/runs/<run-id>/RUN.md

RUN.md should contain everything workers need to understand what this pass is doing (purpose, delta rubric, source anchors, canonical sources + owners).

Coverage tracking (via tasks)

At run start, the parent/orchestrator should:

create a task for each P0 dimension (or P0 facet cluster)
mark tasks complete as evidence-backed coverage is confirmed
use task status during gap analysis to identify missing P0 coverage

Key constraints

Workers return structured Markdown findings (not final evidence files).
Workers read RUN.md at task start.
Workers do not write files to the run folder.
The orchestrator owns judgment calls (conflict resolution, sufficiency, scope).
Verification is conditional—only when needed (see orchestration reference for criteria).

See the orchestration reference for phase details, prompt templates, output contracts, and run coordination guidance.

3.3 Evidence capture (applies to all modes)

Evidence files are primary proof. Keep them reproducible.

⚠️ Avoid: Claims without evidence links. Every finding must trace to a source.

Naming convention:

Use kebab-case matching the dimension: evidence/<dimension-name>.md
For updates/corrections to existing reports, see references/updating-existing-reports.md for guidance on when to:
- append new evidence files (e.g., evidence/<dimension>-update-YYYY-MM-DD.md), vs
- edit existing evidence files in place (surgically) to maintain a clean current-state proof surface.

Note: This template is for evidence files (orchestrator-authored). For subagent output contracts, see references/subagent-orchestration.md.

Default evidence file structure:

# Evidence: <Dimension>

**Dimension:** <Dimension name from rubric>
**Date:** YYYY-MM-DD
**Sources:** <repos/urls searched>

---

## Key files / pages referenced (top 5–15)
- <file path or URL> — why relevant
- ...

---

## Findings

### Finding: <Declarative claim>
**Confidence:** CONFIRMED | INFERRED | UNCERTAIN | NOT FOUND
**Evidence:** <file:line-range OR URL>

```text
<short snippet / quote / output>

Implications:

Negative searches (for NOT FOUND)

Searched: in →

Gaps / follow-ups

— what to check next


Evidence capture rules:
* Include file path + line numbers (or URL + access date)
* Capture the minimum snippet needed to justify the claim
* Document negative searches for "NOT FOUND"
* Label confidence consistently
* When a finding comes from a vendor's own data about their own product/feature, flag it in the evidence file: note the vendor name, what they sell, and that product-incentive bias is possible. This prevents post-hoc audit catches and ensures downstream synthesis includes appropriate caveats.
* When a new finding contradicts or tensions with an earlier finding (from a different dimension or earlier in the same pass), note the tension in the evidence file's "Gaps / follow-ups" section. Do not stop to resolve — continue researching. Resolution happens during synthesis (Step 4) or a coherence audit.

---

## Step 4: Write REPORT.md

Mark the "Write REPORT.md" task as `in_progress`. Mark it `completed` when REPORT.md is written.

**Load:** `references/section-templates.md` for report structure patterns
**Load:** `references/diagram-patterns.md` if architecture diagrams are needed

If your draft REPORT.md is coming out repetitive (mirrors evidence files), or if the user's goal is decision-making:

**Load:** `references/report-synthesis-patterns.md`

**Core rule:** REPORT.md is synthesis. Evidence files are proof. Do not dump raw evidence in REPORT.md.

⚠️ **Avoid:** REPORT.md that mirrors evidence files. Add implications, decision triggers, trade-offs, and uncertainty—not just restated facts.

**Coherence cross-check (as you write, not after):**

When synthesizing findings across dimensions, actively cross-check:

* **Cross-finding consistency:** If two dimensions address the same capability or topic, ensure they don't contradict without explanation. Reconcile (one is conditional) or acknowledge the tension explicitly.
* **Stat consistency across sections:** When the same statistic appears in multiple sections (e.g., exec summary + detail section + benchmark table), verify all instances use the same value. Stats drift when copied between sections.
* **Arithmetic and claim fidelity:** When synthesizing quantitative claims from evidence, verify the math (ratios, percentages, multiples) and preserve the exact population, metric type, and qualifiers from the source. Do not mutate "40% of star performers" into "40% more deals" or "74% of respondents" into "74% of teams" — these are different claims.
* **Confidence-prose alignment:** Match prose certainty to evidence strength — declarative statements ("X does Y") for CONFIRMED findings, hedged language ("evidence suggests") for INFERRED. See `report-synthesis-patterns.md` §5.2.
* **Conditionality:** If a finding is version-bound, config-dependent, or context-specific, state the conditions. Do not flatten into unconditional claims. See `report-synthesis-patterns.md` §5.3.

Default report structure:

```markdown
---
title: "[Report Title]"
description: "[1-3 sentence summary: what this report covers, what questions it answers, key domain terms for AI discoverability]"
createdAt: YYYY-MM-DD
updatedAt: YYYY-MM-DD
subjects:       # optional — proper nouns (companies, technologies, frameworks)
  - [Subject 1]
topics:         # optional — qualitative areas, <=3 words each
  - [topic area]
---

# [Report Title]

**Purpose:** [1-2 sentences: why this report exists and what the reader cares about — from rubric]

---

## Executive Summary

[2-4 paragraphs: Lead with the answer. Key findings. Critical caveats.]

**Key Findings:**
- **[Finding 1]:** One-line summary
- **[Finding 2]:** One-line summary
- **[Finding 3]:** One-line summary

---

## Research Rubric

[Include the agreed rubric for transparency]

---

## Detailed Findings

### [Dimension 1 from Rubric]

**Finding:** [Declarative statement]

**Evidence:** [evidence/<file>.md](evidence/<file>.md)

**Implications:**
- [What this means for the decision]

**Decision triggers (when this matters):**
- [If condition, this finding becomes critical]
- [If condition, this finding is less relevant]

**Remaining uncertainty (if any):**
- [What we could not confirm and why]

### [Dimension 2 from Rubric]
...

---

## Limitations & Open Questions

### Dimensions Not Fully Covered
- [Dimension]: [What couldn't be confirmed, what was searched]

### Out of Scope (per Rubric)
- [Items explicitly excluded]

---

## References

### Evidence Files
- [evidence/<file1>.md](evidence/<file1>.md) - [What it contains]

### External Sources
- [Title](URL) - [Brief description]

### Related Research (optional)
- [<reports-dir>/<report-name>/](<reports-dir>/<report-name>/) - [What it covers that goes deeper on a relevant topic]

Inline source citations: When a specific claim maps cleanly to 1–2 identifiable sources, cite inline using named references: [Proper Noun](URL). Link text is just the proper noun — sample sizes and caveats go in surrounding prose. Claims drawing on 3+ sources are synthesis — omit inline refs and let the **Evidence:** link carry attribution. Most sentences need no inline citation. See report-synthesis-patterns.md §5.5.

Honor stance (see scoping-protocol.md §G for details). Factual reports avoid recommendations; Conclusions reports link recommendations to evidence.

Step 5: Validate

Mark the "Validate" task as in_progress. Mark it completed when all checklist items pass.

Load: references/citation-formats.md to verify evidence citations meet standards

Before delivering:

If validation fails, fix and re-check.

Step 5b: Audit (quality verification)

Mark the "Audit" task as in_progress. Mark it completed when audit findings are reviewed.

Skip this step for: Path B (direct answers) and simple 1-2 dimension reports — unless the user explicitly requests an audit. The overhead isn't justified for lightweight outputs.

Spawn a nested Claude Code instance (via the /nest-claude subprocess pattern) to run the /audit skill on the completed report. The audit agent reads the report cold and verifies both logical consistency and factual accuracy.

Invocation:

Before doing anything, load /audit skill.

Audit the artifact at [REPORT_PATH].
Evidence directory: [EVIDENCE_PATH].
Write findings to [REPORT_DIR]/meta/audit-findings.md.

If the report (or the codebase it audits) is scoped to a monorepo subtree, append subtree context to the invocation prompt — UserPromptSubmit and PreToolUse hooks do not fire for nested instances, so the child has no automatic awareness of the subtree boundary. State the subtree path, the cwd-discipline rule, and which AGENTS.md the auditor must read first. Same applies when fanning out to /research --headless sub-instances at Step 6.3.

After the audit completes, read meta/audit-findings.md.

[LOAD REQUIRED] Use the Skill tool to invoke /assess-findings for the resolution methodology. Do not resolve any finding until the skill is loaded. Also read references/audit-resolution.md for batch orchestration guidance (triage order, severity prioritization).

High severity findings: Return to Step 4 (Write REPORT.md) to address using the resolution taxonomy (Sharpen, Add conditions, Recalibrate, Acknowledge ambiguity, or Re-research). These are errors that would mislead the reader.
Medium severity findings: Fix if straightforward (surgical edits to REPORT.md). Note in meta/_changelog.md.
Low severity findings: Note for awareness. Fix only if trivial.
No findings: Proceed to Step 6.

Step 6: Research Recap & Follow-up

Mark the "Recap + follow-up" task as in_progress. Mark it completed after presenting the recap and follow-up options.

After delivering findings (any path), present a concise recap and naturally surface opportunities to go deeper. The goal is a collaborative research conversation, not a one-shot dump.

6.1 Recap (always do this)

Summarize what was covered in a compact block:

What we investigated: [1–3 sentence summary of scope and approach]

Key findings: [3–5 bullet points — the headline answers]

Confidence gaps: [1–3 bullets — what remains UNCERTAIN or NOT FOUND, if any]

Keep this short. The user already has the full report/answer — the recap is a conversation pivot, not a repeat.

6.2 Surface follow-up directions (always do this)

Based on what emerged during research, offer 2–4 natural follow-up options that would enrich or extend this report with additional depth or dimensions. These should feel like a knowledgeable colleague saying "here's what would make this research more complete" — not a generic menu or a pivot to a different task.

The purpose of follow-ups is to deepen this report, not to escape depth during research. Follow-ups exist for AFTER thorough initial coverage — they are not a substitute for going deep on the rubric's P0 dimensions. If a dimension was scoped as "Deep" in the rubric, it must be covered deeply in the initial pass. "I'll save that for follow-ups" is a failure mode, not a strategy. Follow-ups surface what you couldn't have known to cover until after the research was done, or what would benefit from even more depth than the rubric prescribed.

Follow-ups must be standalone research topics. Each follow-up must be investigable via external sources (web, OSS repos, public documentation, academic papers) without requiring access to the user's proprietary codebase, internal systems, or specific assets. "Evaluate our X against Y" or "apply these findings to our system" are not follow-ups — they are downstream actions that belong outside the research skill. Follow-ups should produce findings that are independently valuable and could enrich the report for any reader.

Where follow-ups come from (ranked by value for report enrichment):

Deeper dives into dimensions covered at moderate depth — dimensions the rubric scoped as "Moderate" or findings where evidence was thin, where going deeper would reveal additional insight that strengthens the report's overall value. (e.g., "We confirmed the auth model at a high level — a deeper dive into token lifecycle and revocation semantics would sharpen the security assessment")
Adjacent dimensions that surfaced organically during research — topics the rubric didn't anticipate but that emerged as potentially relevant to the user's original intent. These are the "you don't know what you don't know" angles that only become visible after research begins. (e.g., "The codebase revealed an undocumented plugin system — investigating its stability and API surface could change the extensibility assessment")
Gaps and open questions from the research — specific claims that couldn't be confirmed, negative searches that warrant a different search strategy, or findings labeled UNCERTAIN that could be resolved with targeted investigation.
Cross-cutting perspectives that would stress-test or reframe existing findings — looking at the same topic from a different stakeholder lens, time horizon, or scale assumption. (e.g., "We assessed features from the builder's perspective — investigating from the operator/SRE perspective might reveal different trade-offs")
Re-audit — if the report was updated significantly after the Step 5b audit (e.g., follow-up research added new dimensions or revised findings), suggest re-running /audit to verify the changes didn't introduce new issues. Only suggest this when substantive changes were made post-audit — not as a default follow-up on every report.

Stay in the research lane. Follow-ups must be further research — additional angles, deeper investigation, adjacent domains, unexplored dimensions. Never suggest:

Derivative deliverables (checklists, templates, scorecards, playbooks) that belong to other skills
Actions on the user's own assets ("evaluate your X against Y", "run this tool on your system", "apply these findings to your codebase")
Topics that only make sense in relation to the user's particular system rather than as standalone research

If a research finding naturally implies a downstream action, name the implication in the report's findings — but the follow-up option should be "investigate X further" (standalone, external-source research), not "let me apply X to your system."

When the user asks "anything else to research?" / "what else should we look at?" / "is there more?":

This is an invitation to exercise deep judgment about what would make this report more valuable — not to recite the gap list or pivot to a different topic. Re-ground in the user's original intent before responding:

Re-read the rubric's Research Purpose ("reader cares most about") and the report's current coverage.
Ask yourself: given what the user was trying to learn, what angles would most enrich this report? Think beyond the existing rubric — what dimensions would a domain expert add that the original scoping missed? What would make the report more complete, more nuanced, or more useful as a standalone reference?
Evaluate candidate angles against the original intent. Rank by how much they would improve the report's value to its stated audience — not just "we have a gap here" but "this gap matters because filling it would [specific enrichment tied to the user's goal]."
Present 2-4 options ranked by value, each tied to the report's purpose and investigable via external sources. If nothing constructive remains within scope, say so honestly rather than inventing busywork.

The goal is the quality of thinking a senior colleague would bring — "given what you're trying to learn, here's what would make this research more complete and why" — not a mechanical scan of unchecked boxes or a pivot to action items.

How to present them:

Where we could go from here:

[Descriptive direction] — [why it matters, tied to what we found]. [Depth: quick check / moderate / deep dive]

[Descriptive direction] — [why it matters]. [Depth: ...]

[Descriptive direction] — [why it matters]. [Depth: ...]

Want to dive deeper into any of these, or is there another angle that came to mind?

Calibration guidance:

Match the follow-up tone to the research tone. If the user asked a quick question (Path B), offer lightweight follow-ups. If they commissioned a deep report (Path A), offer substantial next dimensions.
If research was comprehensive and no meaningful gaps remain, say so: "This covers the scope well. I don't see obvious gaps to chase — but let me know if anything else comes up as you work with this."
If the user explicitly said "just give me the report" or signals they're done, skip 6.2 and just deliver the recap.

6.3 Iterate (if the user engages)

If the user picks a follow-up direction:

For additive dimensions: Treat it as a Path C update (load references/updating-existing-reports.md) if a report exists, or continue the conversation if it was a direct answer.
For deeper dives (single direction): Narrow scope to the specific facet and re-enter Step 3 with a focused mini-rubric.
For multiple deep dives (2+ directions): Load references/nested-fanout.md. Spawns parallel /research --headless instances, consolidates findings back into the parent report. Sub-reports are preserved in fanout/ for auditability.
For action-oriented follow-ups: Transition out of the research skill naturally — e.g., "That moves us from research into implementation. Want me to [specific next action]?"

Each iteration gets its own recap + follow-up cycle. The conversation continues until the user signals they have what they need.

Routing heuristic (for choosing between subagents and nested fanout):

Assess each selected direction on two axes before proceeding:

Facet count: How many independent sub-questions? (1-2 = light, 3+ = heavy)
Source diversity: Same codebase/domain, or multiple external repos/ecosystems?

Assessment	1 direction	2+ directions
Light (1-2 facets, single source)	Answer inline or subagent	Subagents (Step 3.2)
Heavy (3+ facets, multiple sources)	Path C update or single nested instance	Nested fanout

When uncertain, bias toward fanout — the cost of over-fanouting (thin sub-report, wasted tokens) is lower than under-fanouting (shallow coverage of a deep topic).

State what you're doing and why before proceeding. Not as a question — as a transparent assessment the user can redirect if they disagree.

⚠️ Avoid: Offering follow-ups that restate what was already covered. Follow-ups should open new ground, not rehash findings.

Confidence Labels

Use consistently throughout:

CONFIRMED - Direct evidence in evidence/ files
INFERRED - Logical conclusion from patterns
UNCERTAIN - Partial evidence, needs validation
NOT FOUND - Explicitly searched, not present

Anti-Patterns

Top anti-patterns are inlined as ⚠️ warnings in relevant steps. These remain here for reference:

Proposing dimensions without justification
Skipping non-goals
Evidence files that only restate subagent summaries (capture primary-source snippets instead)
Letting subagent narrow scope become the final judgment (the orchestrator owns judgment calls)
Defaulting to conclusions without asking (must honor stance)
Over-updating: rewriting a report wholesale when the user asked for a delta (use Path C)
Generic follow-ups: offering "want to learn more?" without tying options to specific findings or gaps from the research
Deliverable or user-asset follow-ups instead of report-enriching research: suggesting checklists, templates, scorecards, style guides, or other productized outputs — OR suggesting actions on the user's own system ("evaluate your X against Y", "apply this to your codebase", "run this tool on your project") — as "where we could go from here." Follow-ups must be standalone research directions (deeper dives, adjacent dimensions, unexplored angles) investigable via external sources that would enrich the current report, not downstream actions or topics that only make sense in relation to the user's particular system
Skipping the recap: dumping a report and going silent — always close with a recap + natural follow-up options unless the user explicitly signals they're done
Ignoring existing reports: starting new research without scanning the resolved reports directory first — the user may not know what prior research exists, and duplicate work wastes time
Skipping worldmodel + routing and scoping to jump straight to research: The most common failure mode. When invoked, the agent immediately starts web searching and delivers informal findings without running worldmodel or getting rubric confirmation. This bypasses the entire protocol. Worldmodel + routing and scoping confirmation are hard gates — not optional steps (though in headless mode, scoping auto-confirms after proposing the rubric). If you catch yourself about to run a web search before worldmodel has run, STOP.
Defaulting to Path B without explicit user signal: Treating conversational phrasing ("I need to understand...", "what does X do?") as a signal for direct answer delivery. These are normal research requests — Path A is the default unless the user explicitly says "just tell me", "no report needed", or similar.

Additional Resources

Scoping protocol (rubric-first): references/scoping-protocol.md
Updating existing reports (delta/additive/corrective): references/updating-existing-reports.md
Dimension frameworks by report type: references/dimension-frameworks.md
Section templates: references/section-templates.md
ASCII diagram patterns: references/diagram-patterns.md
Evidence citation formats: references/citation-formats.md
Web search guidance (all research): references/web-search-guidance.md
Source code research (OSS + local repos): references/source-code-research.md
Subagent orchestration (deep research): references/subagent-orchestration.md
Analytical synthesis patterns: references/report-synthesis-patterns.md
Audit skill (quality verification): /audit — unified coherence + factual verification, invoked at Step 5b
Audit batch orchestration: references/audit-resolution.md — batch sizing and severity triage for processing many audit findings (use the Skill tool to invoke /assess-findings for resolution methodology)
Report directory conventions (naming, structure, frontmatter): references/report-directory-conventions.md
Catalogue generator: scripts/generate-catalogue.ts — run with bun run generate-catalogue.ts to regenerate <reports-dir>/CATALOGUE.md from report frontmatter

Related skills

More from inkeep/team-skills

Installs

Repository

inkeep/team-skills

GitHub Stars

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykWarn