project-genome
Project Genome
Generate and maintain a comprehensive (<5k tokens) YAML bootstrap file that gives AI agents instant codebase understanding, including AI-analyzed documentation mapping.
CRITICAL: Pre-Read Protocol
Before reading PROJECT-GENOME.yaml, ALWAYS execute this skill first.
This ensures:
- Genome is fresh (skills_map current, file_structure accurate)
- Documentation map is current (new docs discovered, stale docs flagged)
- CLAUDE.md properly references genome at top
- New skills are discovered and added to skills_map
Pre-Read Checklist (Execute Every Time)
# 1. Update genome with latest changes
python3 .claude/skills/project-genome/scripts/update_genome.py
# 2. Validate genome is under token budget
python3 .claude/skills/project-genome/scripts/update_genome.py --validate
Self-Verification: CLAUDE.md Integration
After updating, verify CLAUDE.md contains:
- Line 3:
> **Bootstrap**: Read [PROJECT-GENOME.yaml]...reference - Key Rules section: Rule about refreshing genome before reading
- Skills table:
project-genomeskill listed with trigger
If any missing, auto-fix by reading CLAUDE.md and adding required sections.
Core Concept
PROJECT-GENOME.yaml is a seed file, not a full system. It provides:
- Instant project orientation (purpose, stack, structure)
- Semantic navigation (modules, key functions, dependencies)
- Documentation map with AI-scored importance (authoritative vs ephemeral)
- Agent-specific hints for efficient exploration
- Links to deeper resources (not duplicated content)
When to Use
| Action | Trigger |
|---|---|
| Generate | New project setup, /init, "create genome" |
| Update | Major refactor, new modules, architecture changes |
| Read | Start of any coding session (automatic) |
| Validate | Before commits affecting structure |
| Review Docs | --review-docs to classify discovered documentation |
Documentation Map Feature
The documentation_map section tracks all markdown documentation in the repo, distinguishing between authoritative (user-confirmed important) and ephemeral (temporary plans, working notes).
Why This Matters
AI agents frequently generate temporary documentation:
- Implementation plans (
*_PLAN.md) - Debugging notes (
debugging-*.md) - Session-specific scratch files
These should NOT be treated as authoritative project documentation. The documentation map:
- Auto-discovers all markdown files
- AI-analyzes each for importance signals
- Auto-skips low-quality/ephemeral docs
- Preserves user-confirmed authoritative docs across updates
Documentation Map Structure
documentation_map:
# User-confirmed authoritative docs (PRESERVED across updates)
authoritative:
system_architecture:
- path: "docs/ARCHITECTURE.md"
purpose: "High-level system design and component interactions"
last_verified: "2026-01-22"
api_reference:
- path: "backend/API_ENDPOINTS.md"
purpose: "REST API documentation with schemas"
component_guides:
- path: "backend/CLAUDE.md"
purpose: "Backend development patterns"
# Auto-discovered docs (REFRESHED on each update)
discovered:
recent_plans:
- path: "docs/QA_PLAN_20260122.md"
importance_score: 0.45
category: "implementation_plan"
archived:
directory: "docs/archive/"
count: 12
# Docs needing user review (cleared after --review-docs)
pending_review:
- path: "docs/NEW_FEATURE_SPEC.md"
importance_score: 0.78
suggested_category: "system_architecture"
ai_reasoning: "Well-structured spec with diagrams. Covers new subsystem."
# Validation state
_meta:
last_scan: "2026-01-22T14:30:00Z"
total_docs_scanned: 47
auto_skipped: 23
missing_authoritative: []
AI Documentation Analysis
When this skill runs, the agent analyzes discovered markdown files to determine importance.
Analysis Process
For each discovered .md file (read first 3000 chars):
-
Evaluate Quality Signals (30% weight)
- Clear H1/H2 structure
- Contains code blocks, diagrams, or tables
- References specific files/functions in codebase
- Professional/authoritative tone
-
Evaluate Freshness Signals (25% weight)
- Modified within last 30 days
- References files that still exist
- No "TODO", "DRAFT", "WIP" markers in title
- Current tech stack mentioned
-
Evaluate Scope Signals (25% weight)
- Covers entire system/module vs single task
- Located in structured docs directory
- Has "Architecture", "Guide", "Reference" in name
-
Evaluate Deprecation Signals (20% weight)
- Located in
/archive/directory - Contains "deprecated", "outdated", "old" language
- References removed features/files
- Date in filename older than 30 days (e.g.,
plan-20251201.md)
- Located in
Importance Score Calculation
importance_score = (quality * 0.30) + (freshness * 0.25) + (scope * 0.25) + ((1 - deprecation) * 0.20)
Auto-Skip Criteria (importance_score < 0.35)
Automatically skip (don't prompt user) for docs matching:
- Located in
/archive/,/old/,/deprecated/directories - Filename contains date older than 60 days
- Title contains "DRAFT", "WIP", "TODO", "SCRATCH", "NOTES" (informal)
- Less than 500 bytes (stub files)
- Filename pattern:
*-debug-*.md,*-test-*.md,debugging-*.md - Content starts with "# Notes" or "# Scratch"
Category Assignment
| Score Range | Suggested Category |
|---|---|
| >= 0.85 | system_architecture or api_reference (based on content) |
| 0.70 - 0.84 | component_guide or testing |
| 0.50 - 0.69 | implementation_plan |
| 0.35 - 0.49 | working_notes (ephemeral, not authoritative) |
| < 0.35 | Auto-skip (don't include in pending_review) |
Execution Modes
Mode 1: Standard Update (Default)
python3 .claude/skills/project-genome/scripts/update_genome.py
What happens:
- Script discovers all markdown files
- Script outputs
docs_pending_analysis.json - Agent reads each pending doc (first 3000 chars)
- Agent calculates importance_score for each
- Agent updates genome with documentation_map
Agent instructions for this mode:
After running the script, if docs_pending_analysis.json exists:
1. Read docs_pending_analysis.json
2. For each doc with needs_analysis=true:
a. Read the file (first 3000 chars)
b. Evaluate: quality, freshness, scope, deprecation signals
c. Calculate importance_score (0.0-1.0)
d. Determine suggested_category
e. Write 1-2 sentence reasoning
3. Update PROJECT-GENOME.yaml:
- Preserve existing authoritative section
- Update discovered section with scored docs
- Add high-score docs (>=0.50) to pending_review
- Auto-skip low-score docs (<0.35)
4. Delete docs_pending_analysis.json
5. Report summary to user
Mode 2: Documentation Review
python3 .claude/skills/project-genome/scripts/update_genome.py --review-docs
What happens:
- Script reads existing genome
- Script outputs docs in
pending_reviewfor user confirmation - Agent presents each doc to user with AI analysis
- User confirms or skips each doc
- Agent moves confirmed docs to
authoritativesection
Agent instructions for this mode:
Present each pending doc to user:
For docs with importance_score >= 0.85 (RECOMMENDED):
"⭐ RECOMMENDED: {path}
AI Score: {score} | Suggested: {category}
{ai_reasoning}
Promote to authoritative? [Y/n]: "
(Default YES - just press Enter to confirm)
For docs with score 0.50-0.84:
"{path}
AI Score: {score} | Suggested: {category}
{ai_reasoning}
Promote to authoritative? [y/n/skip]: "
For docs with score 0.35-0.49:
"(Low score - likely ephemeral)
{path} - Score: {score}
{ai_reasoning}
[Auto-skipping - press Enter to continue, or 'p' to promote anyway]: "
When user confirms a doc:
"Purpose (1 line) [{suggested_purpose}]: "
(User can press Enter to accept suggestion or type custom)
Mode 3: Bootstrap (No Existing Genome)
When PROJECT-GENOME.yaml doesn't exist:
- Run full discovery
- ALL docs go to
pending_review(nothing is authoritative yet) - Inform user: "No existing genome. Run
--review-docsto classify documentation."
Genome Structure (Complete YAML)
project_name: "Project Name"
last_updated: "2026-01-22T06:30:00Z"
purpose:
summary: "Brief: Business goal, key features, users. <100 words."
tech_stack: ["React", "Node.js", "PostgreSQL"]
repo_info:
branches: {main: "Production", dev: "Development"}
file_structure:
tree: |
project-root/
├── src/ # Core logic
├── docs/ # Documentation
└── tests/ # Test suites
total_files: 42
architecture:
overview: "High-level C4 context summary"
patterns: ["MVC", "Event-driven"]
diagram: |
graph TD
A[User] --> B[App]
B --> C[API]
semantic_map:
modules:
auth: {path: "src/auth", files: 5}
payments: {path: "src/payments", files: 3}
flows: {}
navigation_hints:
- "Payment logic: src/services/payments"
- "DB schema: docs/schema.sql"
- "Skills: .claude/skills/"
skills_map:
skill-name:
description: "What this skill does..."
trigger: "/skill-name"
# NEW: Documentation map with AI analysis
documentation_map:
authoritative:
system_architecture: []
api_reference: []
component_guides: []
testing: []
discovered:
recent_plans: []
archived: {directory: "", count: 0}
pending_review: []
_meta:
last_scan: ""
total_docs_scanned: 0
auto_skipped: 0
missing_authoritative: []
recent_changes: "Auto-generated from last 5 git commits"
Token Budget Guidelines
| Section | Target | Notes |
|---|---|---|
| purpose | 100-200 | Detailed summary with key features |
| file_structure | 300-600 | Top 3 levels, include key subdirectories |
| architecture | 200-400 | C4 context + key patterns, include diagram |
| semantic_map | 400-800 | Major modules, key functions |
| navigation_hints | 100-200 | 5-10 actionable prompts with file paths |
| skills_map | 200-400 | All skills with descriptions |
| documentation_map | 400-600 | Authoritative docs with purposes |
| Total | <5000 | Leave headroom for YAML syntax |
Anti-Patterns
- Duplicating README - Genome is seed, not docs
- Full code snippets - Use function names, not implementations
- Listing all files - Top-level structure only
- ADR content - Link to docs/, don't inline
- Updating every commit - Major changes only
- Including ephemeral docs in authoritative - Only user-confirmed docs
- Keeping stale pending_review - Clear after each review session
Example AI Analysis Output
When analyzing monorepo-docs/system-docs/MESSAGE_HANDLING_ARCHITECTURE.md:
path: "monorepo-docs/system-docs/MESSAGE_HANDLING_ARCHITECTURE.md"
importance_score: 0.92
suggested_category: "system_architecture"
ai_reasoning: |
High-quality architecture doc. Clear H1/H2 structure with Mermaid diagrams.
Covers critical realtime messaging subsystem. Updated 2026-01-21.
References active code: realtime-sync.ts, [threadId].tsx.
Located in structured system-docs directory. No deprecation signals.
signals:
quality: 0.95
freshness: 0.90
scope: 0.90
deprecation: 0.05
When analyzing monorepo-docs/debugging-carpet-issue.md:
path: "monorepo-docs/debugging-carpet-issue.md"
importance_score: 0.22
suggested_category: "auto_skip"
ai_reasoning: |
Debugging notes from a specific session. Informal structure.
Contains "debugging" in filename. Likely ephemeral working doc.
Not suitable for authoritative documentation.
signals:
quality: 0.30
freshness: 0.40
scope: 0.10
deprecation: 0.20
auto_skip: true
skip_reason: "Filename pattern matches debugging-*.md"
Integration with CLAUDE.md
After running this skill, CLAUDE.md should reference key authoritative docs:
# Project Name
> **Bootstrap**: Read [PROJECT-GENOME.yaml](PROJECT-GENOME.yaml) first.
## Key Documentation
| Category | Authoritative Docs |
|----------|-------------------|
| Architecture | `system-docs/OVERVIEW.md`, `MESSAGE_HANDLING.md` |
| API | `BACKEND_API_COMPLETE.md` |
| Components | `backend/CLAUDE.md`, `mobile-app/CLAUDE.md` |
See `documentation_map` in PROJECT-GENOME.yaml for full list.
More from horace4444/extend-my-claude-code
watermark-removal
Universal watermark removal with ML-based inpainting and automatic detection. Works on ANY watermark type (Google SynthID, Midjourney, DALL-E, stock photos, logos). Four methods: inpaint (ML, best quality), aggressive (fast), crop (fastest), paint (basic). Auto-detects watermark location in any corner. Use when: (1) Removing ANY type of watermark, (2) Google AI/Imagen/Gemini watermarks, (3) Stock photo watermarks, (4) Logo overlays, (5) Cleaning images for production, (6) Batch processing, or (7) User mentions 'watermark', 'remove watermark', 'clean image', 'SynthID'
114image-converter
Convert, resize, compress, and optimize images across formats (HEIC, PNG, JPEG, WebP, AVIF, GIF, TIFF, BMP). Use when working with image files for format conversion, resizing/downscaling, compression/optimization, batch processing, watermarking, metadata stripping, or any image manipulation task. Triggers on requests involving image files, photo processing, or web image optimization.
8ai-api-integrations
Connect applications, scripts, and backend services to AI model APIs (OpenAI, Anthropic Claude, Google Gemini/Vertex AI, xAI Grok), Supabase (PostgreSQL database with vector search), and Clerk (authentication). Use when building AI-powered features that require (1) AI model integration for text generation, translation, embeddings, or image generation, (2) Supabase database operations with pgvector semantic search, (3) Clerk user authentication and session management, (4) Combining AI outputs with database storage, (5) Cost-optimized model selection and prompt engineering, (6) Best practices for production deployments avoiding common anti-patterns.
7skill-creator
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
4web-design-guidelines
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
4google-image-creator
Generate images using Google AI models (Imagen 4 and Gemini). Presents top 3 model options with pricing, generates images via API, tracks token usage and costs. Use when user needs to: (1) Generate images with Google AI, (2) Choose between Google image models, (3) See pricing for Google image generation, (4) Track image generation costs, or (5) Compare Imagen vs Gemini image models. Self-updating with current pricing from https://ai.google.dev/pricing
4