codebase-overview
Codebase Overview Skill
Systematic 4-phase codebase exploration that produces an evidence-backed onboarding report. Phases run in strict order — DETECT, EXPLORE, MAP, SUMMARIZE — because later phases depend on context established by earlier ones. This skill accelerates reading the codebase but does not replace it.
Reference Loading Table
| Signal | Load These Files | Why |
|---|---|---|
| example-driven tasks, errors | examples-and-errors.md |
Loads detailed guidance from examples-and-errors.md. |
| tasks related to this reference | exploration-strategies.md |
Loads detailed guidance from exploration-strategies.md. |
| tasks related to this reference | report-template.md |
Loads detailed guidance from report-template.md. |
Instructions
Execute all phases autonomously. Verify each gate before advancing. Consult references/exploration-strategies.md for language-specific discovery commands.
Before starting any exploration, read and follow any .claude/CLAUDE.md or CLAUDE.md in the repository root because project-specific instructions override default behavior.
This is a read-only skill — keep all project files unmodified because the goal is observation, not mutation. Likewise, leave application execution and test running to other skills because those are execution concerns outside this skill's scope. For deep domain analysis, route to a specialized agent instead.
See
references/examples-and-errors.mdfor worked examples and error handling procedures.
Sensitive-Files Guardrail
Check every file path against this list BEFORE reading because secrets leaked into exploration output are hard to retract and easy to miss. Skip silently without logging the file contents or path.
# Secrets and credentials
.env, .env.*, *.pem, *.key, credentials.json, secrets.*, *secret*, *credential*, *password*
# Authentication tokens
token.json, .npmrc, .pypirc
# Cloud provider credentials
.aws/credentials, .gcloud/, service-account*.json
Phase 1: DETECT
Goal: Determine project type, language, framework, and tech stack.
Step 1: Examine root directory
Start from the current working directory because that is the project the user is asking about.
ls -la
Identify configuration files that indicate project type:
package.json-> Node.js/JavaScript/TypeScriptgo.mod-> Gopyproject.toml,requirements.txt,setup.py-> Pythonpom.xml,build.gradle-> JavaCargo.toml-> Rust- See
references/exploration-strategies.mdfor complete indicator table
Always detect project type before reading source files because framework context changes how you interpret code (e.g., a models/ directory means something different in Django vs. Express).
Step 2: Read primary configuration
Based on detected type, read the main config file. Preference order:
- Python:
pyproject.toml>setup.py>requirements.txt - Node.js:
package.json - Go:
go.mod
Extract: project name, dependencies, language version, build system, scripts/commands.
Step 3: Identify frameworks and tooling
ls -la manage.py next.config.js nuxt.config.js angular.json 2>/dev/null
ls -la Makefile Dockerfile docker-compose.yml 2>/dev/null
ls -la .github/workflows/ 2>/dev/null
Step 4: Check for CLAUDE.md
Read any .claude/CLAUDE.md or CLAUDE.md in the repository root. Follow its instructions throughout remaining phases.
Step 5: Document findings
Use the DETECT Results template from references/examples-and-errors.md.
Gate: Project type identified (language + framework). Tech stack documented. Build/run commands known. Proceed ONLY when gate passes — skipping this gate leads to wrong architectural assumptions downstream.
Phase 2: EXPLORE
Goal: Discover entry points, core modules, data models, API surfaces, configuration, and tests.
Explore only what is needed for the overview because speculative deep-dives waste tokens without proportional value. Limit to 20 files per category because representative samples are more useful than exhaustive coverage. If a category has more than 20 files, note the total count and state that you examined a representative sample.
On explicit user request, deep-dive into specific subsystems, generate architecture diagrams, include full file contents, export findings to a separate file, or analyze dependency vulnerability status. These are off by default because the standard overview does not require them.
Step 1: Find entry points
Use language-specific patterns from references/exploration-strategies.md. Read each entry point file to understand application bootstrapping.
For any language, look for:
mainfunctions or__main__modules- Server/app initialization files
- CLI entry points declared in config
Config files alone are not enough to understand a project because they show dependencies, not architecture — always read entry points and core modules too.
Step 2: Map directory structure
find . -type d \
-not -path '*/\.*' \
-not -path '*/node_modules/*' \
-not -path '*/venv/*' \
-not -path '*/vendor/*' \
-not -path '*/dist/*' \
-not -path '*/build/*' \
| head -50
Exclude noise directories (node_modules/, venv/, vendor/, dist/, build/, __pycache__/) because they contain generated or third-party code that obscures the project's own structure.
Categorize directories by layer — see the Directory Layer Categorization table in references/examples-and-errors.md.
Step 3: Examine data layer
Search for model, schema, and entity files. Read 3-5 representative files. Use the Data Layer Findings template from references/examples-and-errors.md.
Document: entity relationships, primary data structures and their fields, database technology, migration strategy.
Step 4: Discover API surface
Search for route, handler, and controller files. Read 3-5 key API files. Use the API Surface Findings template from references/examples-and-errors.md.
Document: endpoint structure and URL patterns, HTTP methods and request/response formats, authentication and authorization patterns, API versioning strategy.
Step 5: Identify configuration
ls -la .env .env.example config.yaml config.json settings.py 2>/dev/null
ls -la config/*.yaml config/*.json config/*.toml 2>/dev/null
Document: required environment variables and their purpose, external service dependencies (databases, APIs, caches, queues), feature flags or runtime options.
Step 6: Examine test structure
find . -name "*_test.*" -o -name "*.test.*" -o -name "*Test.*" -o -path "*/tests/*" \
2>/dev/null | head -20
Document: testing framework, test organization (co-located vs separate directory), common patterns (fixtures, factories, mocks), coverage tooling.
Gate: Entry points identified. Core modules mapped. Data layer understood. API surface discovered. Configuration examined. Test structure documented. Proceed ONLY when gate passes.
Phase 3: MAP
Goal: Synthesize findings into architectural understanding.
Step 1: Identify design patterns
Based on examined files, identify and document with evidence. Every architectural claim must cite an examined file and path because uncited claims cannot be verified and mislead readers. Use the Design Patterns template from references/examples-and-errors.md.
Verify architectural claims against source files because READMEs may be outdated or incomplete.
Step 2: Map key abstractions
Identify the 5-10 most important types, classes, or modules. Use the Key Abstractions template from references/examples-and-errors.md.
Document: core domain concepts, primary interfaces/abstractions, component communication (direct calls, events, queues).
Step 3: Document data flow
Trace a typical request from entry point through the full stack. Use the Request Flow template from references/examples-and-errors.md. All file paths in output must be absolute because relative paths are ambiguous when the report is read outside the project directory.
Step 4: Analyze recent activity
git log --oneline --no-decorate -10
Include recent commit themes (last 10 commits). Categorize: Feature development, Bug fixes, Refactoring, Infrastructure.
If not a git repository, note this limitation and skip this step.
Gate: Design patterns identified with file evidence. Key abstractions mapped (5-10 concepts). Data flow documented with absolute paths. Recent activity analyzed. Proceed ONLY when gate passes.
Phase 4: SUMMARIZE
Goal: Generate structured overview report.
Step 1: Generate report
Use the template in references/report-template.md. Fill every section with evidence from examined files. Requirements:
- All file paths MUST be absolute
- All architectural claims MUST cite source files
- All commands MUST come from actual config files (package.json, Makefile, etc.)
- Empty sections MUST note why information is unavailable
Report facts without self-congratulation — show evidence, not descriptions of how thorough the exploration was. Every claim must have file-backed evidence because "report looks complete" is not the same as "report is complete."
Step 2: Quality check
Before outputting, verify:
- All 13 template sections addressed
- No placeholder text remains
- Every claim backed by file evidence
- Paths are absolute, not relative
- Commands are real, not guessed
Adjust the 20-files-per-category limit if a specific area needs deeper sampling — some projects concentrate complexity in one layer. Note any such adjustments in the report.
Step 3: Generate "Where to Add New Code" section
Append a prescriptive section to the report. For each major code category discovered during exploration, provide the directory, a concrete example file to use as a template, and any naming conventions.
## Where to Add New Code
| I want to add... | Put it in... | Follow the pattern in... |
|-------------------|-------------|-------------------------|
| [category from exploration] | [directory path] | [concrete example file path] |
Every entry MUST reference a real file that already exists. If a category has no clear home, note that explicitly rather than guessing.
Step 4: Post-exploration secret scan
Before presenting results, scan all output for accidentally captured secrets:
grep -iE '(password|secret|token|api[_-]?key|auth|credential)\s*[:=]' <output_file> || true
grep -E '(AIza|sk-|ghp_|gho_|AKIA|-----BEGIN)' <output_file> || true
If any matches are found: redact the matched lines (replace values with [REDACTED]), flag the finding, and note which file to review manually.
Step 5: Output report
Display complete markdown report to stdout. Generate the report to stdout by default because most users need inline context, not a separate file. If export behavior is explicitly requested, also write to file.
Remove any temporary files created during exploration.
Gate: Report has all sections filled. All paths are absolute. All claims cite evidence. "Where to Add New Code" section populated with real file references. Secret scan passed (no unredacted secrets in output). Report is actionable for onboarding. Quality check passes. Total files examined count is accurate.
Parallel Domain-Specific Mapping (Deep Dive Mode)
When the user requests a full architectural analysis (e.g., "give me the full picture", "I'm new to this codebase", "we're considering a major refactor"), use parallel domain-specific agents instead of single-threaded sequential exploration.
Use parallel mapping when the exploration goal is broad and open-ended — full onboarding, major refactor preparation, or comprehensive architectural review. Use the standard 4-phase flow for targeted questions about a single subsystem.
Launch 4 parallel agents using Task, each focused on a specific domain. Each agent follows the sensitive-files guardrail and writes a structured document.
See
references/examples-and-errors.mdfor the agent domain table, orchestration rules, and the agent instructions template.
Post-Parallel Gate: At least 3 of 4 domain agents completed. All output files exist. Secret scan passed across all output files. Each file contains file-backed evidence (not generic descriptions).
References
Reference Files
${CLAUDE_SKILL_DIR}/references/report-template.md: Standard markdown report template with all sections${CLAUDE_SKILL_DIR}/references/exploration-strategies.md: Language-specific discovery commands and patterns${CLAUDE_SKILL_DIR}/references/examples-and-errors.md: Worked examples, error handling, parallel agent template and domain table
More from notque/claude-code-toolkit
generate-claudemd
Generate project-specific CLAUDE.md from repo analysis.
12fish-shell-config
Fish shell configuration and PATH management.
12pptx-generator
PPTX presentation generation with visual QA: slides, pitch decks.
12image-to-video
FFmpeg-based video creation from image and audio.
9data-analysis
Decision-first data analysis with statistical rigor gates.
9skill-eval
Evaluate skills: trigger testing, A/B benchmarks, structure validation.
9