codebase-researcher

Installation

SKILL.md

Codebase Researcher

Systematic deep research skill that analyzes codebases from the top level down, generating comprehensive knowledge about project structure, architecture, and implementation patterns.

Purpose

Conduct systematic research of codebases to:

Build comprehensive understanding of project architecture
Document key components and their relationships
Identify patterns, conventions, and best practices
Generate knowledge bases (SKILL.md, CLAUDE.md, references/)
Support onboarding and knowledge transfer

When to Use

Use this skill when:

Starting work on a new codebase
Documenting existing architecture
Creating CLAUDE.md or SKILL.md files
Performing comprehensive code audits
Building team knowledge bases

Research Methodology

Standard Deep Research Algorithm

The research process follows a hierarchical approach:

Identify Root-Level Directories - List all significant folders, excluding common ignore patterns
Spawn Specialized Subagents - Create one subagent per important directory
Parallel Analysis - Each subagent analyzes its assigned directory deeply
Aggregate Findings - Combine insights into cohesive documentation
Generate Artifacts - Produce SKILL.md, CLAUDE.md, and references/

Directory Filtering

Automatically exclude from research:

node_modules/, venv/, .venv/, env/
dist/, build/, out/, .next/, target/
.git/, .svn/, .hg/
Hidden directories (starting with .) unless specifically relevant
__pycache__, *.egg-info

Include in research:

Source code directories (src/, lib/, app/)
Configuration directories (.roo/, .claude/, config/)
Documentation (docs/, specs/, architecture/)
Scripts and tooling (scripts/, tools/, cli/)
Tests (tests/, __tests__/, spec/)

Using the Slash Command

Execute deep research via the provided slash command:

/deep-research

This command will:

Scan root-level directories
Determine which directories are important
Generate and execute a claude command with --agents flag
Spawn one subagent per important directory
Collect and synthesize findings

See scripts/deep_research.sh for implementation details.

Script Integration

Deep Research Script

The scripts/deep_research.sh script automates the research process:

Key Features:

Automatically filters ignored directories via .gitignore patterns
Dynamically generates subagents JSON for claude --agents
Spawns parallel analysis tasks
Aggregates results into structured output

Usage:

# From skill directory
./scripts/deep_research.sh [output_dir]

# From project root
./.roo/skills/codebase-researcher/scripts/deep_research.sh ./output

Subagent Generation

The script uses claude --agents to programmatically create specialized subagents:

{
  "cli-analyzer": {
    "description": "Analyze CLI directory structure and command patterns",
    "prompt": "You are an expert at analyzing CLI tooling. Examine the directory structure, identify command patterns, and document the architecture.",
    "tools": ["Read", "Glob", "Grep", "Bash"]
  },
  "skills-analyzer": {
    "description": "Analyze skills directory and documentation",
    "prompt": "You are an expert at analyzing Agent Skills. Examine skill structure, identify patterns, and document the skill ecosystem.",
    "tools": ["Read", "Glob", "Grep"]
  }
}

Output Structure

Research produces:

Primary Output: SKILL.md or CLAUDE.md

Contains:

Project overview and purpose
Architecture summary
Key directories and their roles
Important patterns and conventions
Getting started guidance

Supporting Output: references/

Detailed documentation organized by concern:

references/architecture.md - System architecture
references/directory_structure.md - Directory layout and purpose
references/patterns.md - Code patterns and conventions
references/dependencies.md - Dependency graph and relationships

Research Workflow

Phase 1: Discovery

# List root directories
ls -la

# Identify significant folders
find . -maxdepth 1 -type d ! -name ".*" ! -name "node_modules"

# Check for .gitignore patterns
cat .gitignore

Phase 2: Subagent Orchestration

For each important directory, generate a specialized subagent with:

Name: Derived from directory (e.g., "cli-analyzer", "docs-analyzer")
Description: Directory-specific research scope
Prompt: Tailored analysis instructions
Tools: Read, Glob, Grep, Bash (as needed)

Phase 3: Deep Analysis

Each subagent performs:

Structure Mapping - Document file organization
Code Analysis - Identify key functions, classes, modules
Pattern Recognition - Find conventions and idioms
Dependency Tracking - Map imports and relationships
Documentation Review - Extract existing docs

Phase 4: Synthesis

Combine subagent findings into:

Unified architecture documentation
Cross-cutting concerns and patterns
Integration points and workflows
Knowledge base for future reference

Best Practices

Effective Research

Start Broad, Go Deep - Begin with high-level overview, then drill into specifics
Follow the Code - Trace execution paths and data flows
Document Decisions - Capture why things are structured as they are
Identify Gaps - Note missing documentation or unclear patterns
Preserve Context - Link related components and explain relationships

Quality Criteria

Research output should:

Be accurate - Reflect actual code structure and behavior
Be comprehensive - Cover all significant components
Be actionable - Enable others to work with the codebase
Be maintainable - Update easily as code evolves
Be discoverable - Organized for easy navigation

Common Pitfalls

Avoid:

Overwhelming detail - Focus on significant patterns, not every line
Outdated documentation - Verify findings against current code
Isolated analysis - Show how components relate
Assumption-based docs - Confirm behavior through code inspection

Example Research Session

# 1. Execute deep research
/deep-research

# 2. Review generated subagent plan
#    The script will show which subagents will be created

# 3. Confirm execution
#    Claude runs research with specialized subagents

# 4. Review outputs
#    - CLAUDE.md or SKILL.md in project root
#    - references/ directory with detailed documentation

Integration with Existing Skills

This skill complements:

skill-creator - Use research findings to create specialized skills
spec-writer - Research informs specification documents
architecture - Deep analysis feeds architecture documentation

Technical Notes

Subagent Spawning

Uses Claude Code's --agents flag for programmatic subagent definition:

claude --agents '{
  "subagent-name": {
    "description": "What this subagent does",
    "prompt": "Detailed instructions",
    "tools": ["Read", "Grep", "Glob"],
    "model": "sonnet"
  }
}'

Headless Mode Integration

Can run non-interactively for automation:

claude -p "Perform deep research on this codebase" \
  --permission-mode plan \
  --output-format json

References

More from zpankz/mcp-skillset

Installs

Repository

zpankz/mcp-skillset

GitHub Stars

First Seen

Jan 26, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykPass