architecture-md-builder
Architecture.md Builder
Create production-quality ARCHITECTURE.md files that serve as definitive maps of any codebase, following matklad's canonical guidelines with modern AI-agent documentation patterns.
When to Use This Skill
- Creating architecture documentation for a new or existing repository
- Auditing a codebase to understand its structure
- Onboarding documentation for developers and AI agents
- User asks to "document the architecture", "create architecture.md", or "map this codebase"
Core Principles (matklad's Guidelines)
The canonical ARCHITECTURE.md follows these principles:
- Bird's eye overview - Problem being solved, high-level approach
- Coarse-grained codemap - Modules and relationships (country-level, not state-level)
- Named entities - Important files, types, modules by name (no links, use symbol search)
- Architectural invariants - Constraints, what is NOT done, absence patterns
- Layer boundaries - Transitions between systems
- Cross-cutting concerns - Issues spanning multiple modules
See references/matklad-guidelines.md for detailed explanations.
Workflow
Phase 1: Research Best Practices (Optional)
If unfamiliar with architecture documentation patterns, use Exa search:
# Search for exceptional architecture.md examples
python3 ~/.claude/skills/exa-search/scripts/exa_search.py \
"architecture.md documentation best practices" \
--category github -n 10
# Find matklad's original guidelines
python3 ~/.claude/skills/exa-search/scripts/exa_research.py \
"matklad ARCHITECTURE.md guidelines rust-analyzer"
Phase 2: Codebase Exploration
Launch 2-4 parallel exploration agents to map the codebase thoroughly:
Use the Task tool with subagent_type=Explore for each major system area:
1. Core/Engine - Entry points, main abstractions, data structures
2. Transport/API - HTTP, WebSocket, message handling
3. Database/Persistence - Schema, migrations, queries
4. Frontend/UI - Components, state management, routing
Agent prompts should ask:
- What are the key abstractions and types?
- How does data flow through this system?
- What are the main files and their line counts?
- What patterns are used consistently?
- What invariants does the code enforce?
Target output: ~10-15k words of analysis per agent covering the full system.
Phase 3: Draft ARCHITECTURE.md
Create the document following this structure:
# Architecture
Brief intro: what this document is for, who it's for.
## Bird's Eye View
- What problem does this solve?
- What is the core paradigm/approach?
- Key design principles (3-5 bullets)
[ASCII diagram showing major components]
## High-Level Data Flow
[Mermaid flowchart showing data flow]
## Codemap
### System 1 (`path/`)
Description, key files with line counts, key abstractions table.
### System 2 (`path/`)
...
## Architectural Invariants
Rules that are ALWAYS true. Code patterns that are NEVER violated.
## Cross-Cutting Concerns
Issues that span multiple modules (auth, logging, error handling).
## Layer Boundaries
Diagram showing layers and their interfaces.
## Key Files Reference
| File | Lines | Purpose |
|------|-------|---------|
| ... | ... | ... |
## Common Questions
FAQ format: "Where do I find X?" → Answer
See references/document-structure.md for detailed section guidance.
See assets/architecture-template.md for a starting template.
Phase 4: Verification
Launch 2-3 review agents to verify accuracy:
Use the Task tool with subagent_type=Explore to verify:
1. General accuracy - Do descriptions match actual code?
2. Line counts - Are they roughly accurate?
3. File references - Do all referenced files exist?
Verification checklist:
- All referenced files exist
- Line count estimates within 20% of actual
- ASCII/Mermaid diagrams render correctly
- Document answers "where's the thing that does X?"
- No stale information from previous versions
Phase 5: Apply Corrections
Update the document based on review findings:
- Correct line counts
- Add missing files to structures
- Fix any inaccurate descriptions
- Update counts (e.g., "11 modules" → "13 modules")
Quality Guidelines
Diagrams
ASCII diagrams for component relationships:
┌─────────────┐ ┌─────────────┐
│ Frontend │────▶│ Backend │
└─────────────┘ └─────────────┘
Mermaid diagrams for data flows:
flowchart TB
A[Input] --> B[Process]
B --> C[Output]
Line Counts
Include approximate line counts for key files:
- Helps readers gauge complexity
- Use
wc -lto verify - Round to nearest 10 or 50
Named Entities
Reference files, types, and modules by name without links:
- Good: "See
WorkingMemory.tsfor the immutable memory implementation" - Bad: "See WorkingMemory"
Why: Symbol search (Cmd+T, osgrep) is more reliable than links that rot.
Invariants
Document what the code NEVER does:
- "WorkingMemory never mutates in place"
- "API keys never reach the browser"
- "All database queries use prepared statements"
Target Length
- Small projects: 200-400 lines
- Medium projects: 400-700 lines
- Large projects: 700-1000 lines
- Maximum: ~1200 lines (split into linked docs if larger)
Output
Single file: ARCHITECTURE.md in project root
Optionally update CLAUDE.md or README.md with a reference to the new architecture document.
Example Usage
User: "Create an architecture.md for this repo"
1. Launch 3 exploration agents targeting core, transport, and frontend
2. Synthesize findings into ARCHITECTURE.md following the template
3. Launch 2 review agents to verify accuracy
4. Apply corrections
5. Commit and optionally update CLAUDE.md
Resources
references/matklad-guidelines.md- Canonical guidelines with rationalereferences/document-structure.md- Detailed section guidanceassets/architecture-template.md- Starting template