codebase-guide
Codebase Guide
Produce a comprehensive, beginner-friendly Markdown guide that explains a code repository to someone who has never seen it. The guide covers purpose, tech stack, architecture, data flow, key files, and how to run the project. Every technical term is defined on first use.
Workflow
Three phases, in order. Do not start writing until Phase 3.
- Discovery — gather raw data about the repository
- Analysis — build a mental model from the data
- Writing — produce the output document
Phase 1: Discovery
Systematically explore the repository. Gather information in each category below. Adapt to the project — not every category applies.
Project Identity
- Package manifests:
package.json,Cargo.toml,pyproject.toml,setup.py,go.mod,composer.json,Gemfile,*.csproj,pom.xml,build.gradle - Documentation:
README.md,README.rst,CHANGELOG,ARCHITECTURE.md,docs/ - CI/CD:
.github/workflows/,.gitlab-ci.yml,Jenkinsfile - Containerisation:
Dockerfile,docker-compose.yml
Project Structure
- Top-level directory listing (1–2 levels deep)
- Identify: source directories, test directories, config, build output, docs
- Detect monorepo patterns: workspaces,
packages/,apps/,libs/
Entry Points and Configuration
- Look for main/index/app/server files in the source root
- Build and run scripts from the manifest (npm scripts, Makefile targets, etc.)
- Environment config:
.env.example,config/, settings files
Dependencies and Stack
- Framework detection from manifests and import statements
- Database/storage: docker-compose services, ORM config, migration files,
schema files (
.prisma,*.sql,models.*) - External service integrations: API clients, SDK imports
Tests and API Surface
- Test directory structure and framework
- Public API: route definitions, exported modules, CLI commands, OpenAPI specs
- Estimate: are there tests? (presence, not coverage percentage)
Discovery Scope
Scale discovery effort to project size:
| Size | Files | Approach |
|---|---|---|
| Small | <50 | Scan everything |
| Medium | 50–500 | Focus on src/, config, entry points |
| Large | 500+ | Sample representative modules, focus on architecture boundaries |
After discovery, read the most important files: main entry points, core modules, config, and at least one test file.
Do not read: generated files, lock file contents, vendor directories, minified bundles.
Phase 2: Analysis
Before writing, answer these questions internally. Do not produce output yet.
Purpose
- What does this project do in one sentence?
- Who is the intended user?
- What problem does it solve?
Architecture
- What is the high-level structure? (monolith, microservices, library, CLI, full-stack app, pipeline, serverless)
- What are the main boundaries or layers?
- How does data flow through the system? (request lifecycle, pipeline stages, event flow)
Key Abstractions
- What are the 3–7 most important types, classes, or modules?
- What patterns are used? (MVC, pub/sub, middleware, hooks, plugins)
- What are the important interfaces or contracts between modules?
Developer Experience
- How does someone set up and run this locally?
- How are tests run?
- What is the build/deploy pipeline?
- Are there non-obvious prerequisites?
Rough Sizing
- Tiny (<500 LOC), small, medium, or large (>50k LOC)?
- This determines output depth (see writing rules).
Phase 3: Writing
Produce one Markdown file following the section structure in
references/output-template.md. Apply the guidelines from
references/writing-rules.md.
Read both reference files before starting this phase.
Output Location
- Default path:
docs/CODEBASE_GUIDE.md. Create thedocs/directory if it does not exist. - If the user specifies a path, use that instead. Create parent directories as needed.
- Always confirm the output path with the user before writing.
- If a file already exists at the path, ask before overwriting.
Required Sections
Follow the template order. Omit sections that do not apply:
- Document Header
- What Is This
- Tech Stack
- Project Map
- Architecture
- Data Flow
- Key Files Explained
- Key Concepts
- How to Run
- Testing
- Where to Go Next
- Glossary (only if the project has domain-specific terminology)
Target Length
- Small project: 100–200 lines
- Medium project: 200–400 lines
- Large project: 300–500 lines
Output Defaults
- No emoji in headings
- Use Mermaid syntax for diagrams (renders on GitHub, GitLab, most doc viewers)
- Use annotated ASCII trees for directory structure
- Copy-pasteable commands in code blocks with language annotations
- Define every technical term on first use
Reference Loading
Load only the reference you need for the current phase:
- Output section specs →
references/output-template.md - Tone and writing guidelines →
references/writing-rules.md
Read these before starting Phase 3. Do not load them during discovery or analysis.
Scope Limits
This skill produces one static explanatory document. It does not:
- Generate API documentation (use dedicated tools like typedoc, pydoc, swagger)
- Create step-by-step tutorials or how-to guides
- Maintain or auto-update the document over time
- Cover deployment runbooks or operational procedures
For monorepos: explain the top-level structure and one representative package in depth. List other packages with one-sentence descriptions.