overview
SourceAtlas: Project Overview (Stage 0 Fingerprint)
Constitution: ANALYSIS_CONSTITUTION.md v1.0
Context
Arguments: ${ARGUMENTS:-.}
Goal: Generate project fingerprint by scanning <5% of files to achieve 70-80% understanding in 10-15 minutes.
Auto-Save: Results automatically saved to .sourceatlas/overview.yaml (or subdirectory-specific path)
Time Limit: 10-15 minutes (typically 0-5 minutes)
Cache Check (Highest Priority)
If --force is NOT in arguments, check cache first:
-
Calculate cache path:
- No path argument or
.:.sourceatlas/overview.yaml - With path (e.g.,
src/api):.sourceatlas/overview-src-api.yaml
- No path argument or
-
Check if cache exists:
ls -la .sourceatlas/overview.yaml 2>/dev/null -
If cache exists:
- Calculate days since modification
- Use Read tool to read cache
- Output:
π Loading cache: .sourceatlas/overview.yaml (N days ago) π‘ Add --force to re-analyze - If over 30 days: Show warning
- Output cache content
- End, do not execute analysis
-
If cache does not exist: Continue with analysis
If --force is in arguments: Skip cache, execute analysis
Your Task
Execute Stage 0 Analysis Only - generate project fingerprint using information theory principles.
Information Theory Approach:
- High-entropy files contain disproportionate information
- Scan priority: Documentation β Configuration β Models β Entry Points β Tests
- Scale-aware: TINY/SMALL/MEDIUM/LARGE/VERY_LARGE projects need different approaches
Core Workflow
Execute these phases in order. See workflow.md for complete details.
Phase 1: Project Detection & Scale-Aware Planning (2-3 minutes)
Purpose: Detect project type, count files, determine scale, set scan limits.
Execute detection:
# Try helper script first (recommended)
if [ -f ~/.claude/scripts/atlas/detect-project.sh ]; then
bash ~/.claude/scripts/atlas/detect-project.sh ${ARGUMENTS:-.}
elif [ -f scripts/atlas/detect-project.sh ]; then
bash scripts/atlas/detect-project.sh ${ARGUMENTS:-.}
else
echo "Warning: detect-project.sh not found, using manual detection"
fi
Scale-Aware Scan Limits:
- TINY (<5 files): 1-2 files (50% max)
- SMALL (5-15 files): 2-3 files (10-20%)
- MEDIUM (15-50 files): 4-6 files (8-12%)
- LARGE (50-150 files): 6-10 files (4-7%)
- VERY_LARGE (>150 files): 10-15 files (3-7%)
β See workflow.md#phase-1 for manual fallback
Phase 2: High-Entropy File Prioritization (5-8 minutes)
Purpose: Scan highest information-density files first.
Scan Priority Order:
- Documentation (README.md, CLAUDE.md, docs/)
- Configuration (package.json, docker-compose.yml, etc.)
- Core Models (models/, entities/, domain/) - pick 2-3 only
- Entry Points (app.ts, routes/) - pick 1-2 examples
- Tests - pick 1-2 examples
Execute scanning:
# Use helper script if available
if [ -f ~/.claude/scripts/atlas/scan-entropy.sh ]; then
bash ~/.claude/scripts/atlas/scan-entropy.sh ${ARGUMENTS:-.}
else
echo "Warning: scan-entropy.sh not found, scanning manually"
fi
AI Tool Detection:
# Detect AI collaboration level (Tier 1 + Tier 2)
if [ -f ~/.claude/scripts/atlas/detect-ai-tools.sh ]; then
bash ~/.claude/scripts/atlas/detect-ai-tools.sh ${ARGUMENTS:-.}
else
# Fallback: manual checks
ls -la CLAUDE.md .cursorrules .windsurfrules CONVENTIONS.md AGENTS.md .aiignore 2>/dev/null
ls -la .claude/ .cursor/rules/ .windsurf/rules/ .clinerules/ .roo/ .continue/rules/ .ruler/ 2>/dev/null
fi
β See workflow.md#phase-2 for manual commands
Phase 3: Generate Hypotheses (3-5 minutes)
Purpose: Generate scale-appropriate hypotheses with confidence levels and evidence.
Hypothesis Categories:
- Technology Stack: Languages, frameworks, databases, testing
- Architecture: Patterns, structure, layering
- Development Practices: Code quality, testing, documentation
- AI Collaboration: Tool detection (Level 0-4)
- Business Domain: Purpose, entities, features
Scale-Aware Targets:
- TINY: 5-8 hypotheses
- SMALL: 7-10 hypotheses
- MEDIUM: 10-15 hypotheses
- LARGE: 12-18 hypotheses
- VERY_LARGE: 15-20 hypotheses
Each hypothesis must include:
- hypothesis: Clear statement
- confidence: 0.0-1.0 (aim for >0.7)
- evidence: file:line references
- validation_method: How to verify
β See workflow.md#phase-3 for detailed guidance
Output Format
Generate output with branded header, then YAML format:
πΊοΈ SourceAtlas: Overview
βββββββββββββββββββββββββββββββ
π [project_name] β [SCALE] ([file count] files)
Then YAML content with sections:
metadata: project_name, scan_time, total_files, scanned_files, scan_ratio, project_scale, contextproject_fingerprint: project_type, scale, primary_language, framework, architecturetech_stack: backend, frontend (optional), infrastructure (optional)hypotheses: architecture, tech_stack, development, ai_collaboration, businessscanned_files: List with file, reason, key_insightsummary: understanding_depth, key_findings
β See output-template.md for complete YAML structure and examples
Critical Rules
- Scale-Aware Scanning: Follow recommended file limits from Phase 1
- Exclude Common Bloat: Never scan .venv/, node_modules/, vendor/, pycache, .git/
- Time Limit: Complete in 10-15 minutes (typically 0-5 minutes)
- Hypothesis Quality: Each must have confidence >0.7 and evidence
- Scale-Aware Targets: Use hypothesis targets appropriate for project scale
- No Deep Diving: Understand structure > implementation details
- STOP after Stage 0: Do not proceed to validation or git analysis
Handoffs Decision Rules
Follow Constitution Article VII: Handoffs Principles
β οΈ Choose ONE output, NOT both:
Case A - End (No Table): When any condition is met:
- Project too small: TINY (<10 files)
- Findings too vague: Cannot provide high confidence (>0.7) parameters
- Goal achieved: AI collaboration Level β₯3 and scale TINY/SMALL
Output:
β
**Analysis sufficient** - Project is small, can read all files directly
Case B - Suggestions (Table): When project scale is large enough or clear next steps exist.
| Finding | Command | Parameter |
|---|---|---|
| Clear patterns | /sourceatlas:pattern |
Pattern name |
| Complex architecture | /sourceatlas:flow |
Entry point file |
| Scale β₯ LARGE | /sourceatlas:history |
No parameters |
| High risk areas | /sourceatlas:impact |
Risk file/module |
Format:
## Recommended Next
| # | Command | Purpose |
|---|---------|---------|
| 1 | `/sourceatlas:pattern "repository"` | Found Repository pattern in 15 files |
π‘ Enter a number (e.g., `1`) or copy the command to execute
β See reference.md#handoffs for detailed logic
Self-Verification Phase (REQUIRED)
Purpose: Prevent hallucinated file paths, incorrect counts, fictional configs. Execute AFTER output generation, BEFORE save.
Verification Steps:
Step V1: Extract Verifiable Claims
Extract from generated YAML:
- File paths (
scanned_files[].file) - Config files (
tools_detected[].config_file) - File count (
metadata.total_files) - Git branch (
metadata.context.git_branch) - Evidence references (
hypotheses.*.evidence)
Step V2: Parallel Verification
Run ALL checks in parallel:
- Verify scanned files exist:
test -f path - Verify AI tool configs exist:
test -f config - Verify file count: Β±10% tolerance
- Verify git branch:
git branch --show-current - Verify evidence files exist
Step V3: Handle Results
- β All pass β Continue to output/save
- β οΈ 1-2 fail β Correct claims, note in summary
- β 3+ fail β Re-execute analysis phases
Step V4: Verification Summary
Add to footer:
If all passed:
β
Verified: [N] scanned files, [M] config paths, file count
If corrected:
π§ Self-corrected: [list corrections]
β
Verified: [N] scanned files, [M] config paths, file count
β See verification-guide.md for complete checklist and examples
Auto-Save (Default Behavior)
After verification passes, automatically:
- Create directory:
mkdir -p .sourceatlas - Save YAML output to:
- Root:
.sourceatlas/overview.yaml - Subdirectory:
.sourceatlas/overview-[path].yaml
- Root:
- Confirm:
πΎ Saved to .sourceatlas/overview.yaml
β See reference.md#auto-save for details
Advanced
- Scale-aware analysis: reference.md#scale-aware-analysis
- Helper scripts: reference.md#helper-scripts
- Cache behavior: reference.md#cache-behavior
- AI collaboration detection: reference.md#ai-collaboration-detection
- Information theory: reference.md#information-theory-principles
Output Header
Start your output with:
πΊοΈ SourceAtlas: Overview
βββββββββββββββββββββββββββββββ
π [project_name] β [SCALE] ([file count] files)
Then follow YAML structure in output-template.md.
More from lis186/sourceatlas
code-flow-tracer
Trace code execution paths and data flow. Use when user asks "how does X work", "what happens when X", "trace the flow of X", "where does data come from", or needs to understand feature implementation.
13codebase-overview
Quickly understand a new codebase's architecture, tech stack, and patterns. Use when user asks "what is this project", "project overview", "how is this codebase structured", "what tech stack", or when onboarding to a new codebase.
13impact-analyzer
Analyze what code will be affected by changes. Use when user asks "what will break if I change X", "impact of changing X", "dependencies of X", "is it safe to modify X", or before making significant code changes.
11pattern-finder
Find implementation examples and design patterns in the codebase. Use when user asks "how to implement X", "how does this project do X", "show me examples of X", "where is X implemented", or needs to follow existing code conventions.
10history-analyzer
Analyze git history for hotspots, coupling, and knowledge distribution. Use when user asks "who knows this code", "what files change most", "hotspots", "bus factor", "knowledge silos", or needs to understand code evolution.
10dependency-analyzer
Analyze dependencies for upgrade planning and migration. Use when user asks "upgrade to X", "migrate from X to Y", "what breaks if we upgrade", "iOS 17 migration", "React 18 upgrade", or planning framework/SDK upgrades.
10