Project Onboarding

Help developers quickly build mental models of unfamiliar projects through systematic analysis and staged delivery.

Interactive Workflow

This skill uses a phased approach with user checkpoints to manage context efficiently and focus analysis where needed.

Phase 0: Context Collection

Before starting analysis, gather essential context. If the user provides only a project path, auto-detect the rest:

I'll help you understand this project. First, confirm a few details:

1. **Project path**: [confirm or ask]
2. **Your goal**:
   - Quick onboarding (run locally within 1 hour)
   - Fix specific issue
   - Add new feature
   - Understand for maintenance
   - Other: _____
3. **Time budget**: 1 hour / 1 day / 1 week / custom
4. **Known tech stack** (optional, auto-detect if empty): _____

If you just want to start quickly, I'll auto-detect everything from the code.

Skip questions for which answers are already clear from context.

Phase 1: Project Health Check (5-10 minutes)

Execute rapid diagnostic scan to identify project type, quality signals, and immediate concerns.

Automated Detection Steps

Identify project type

# Check for ecosystem markers
ls -la | grep -E "package.json|pom.xml|go.mod|Cargo.toml|requirements.txt|Gemfile|composer.json"

Health indicators

# Check documentation
ls README* 2>/dev/null && echo "✅ README exists" || echo "❌ No README"

# Check git history activity
git log --oneline --since="6 months ago" | wc -l

# Find configuration examples
ls .env.example .env.sample 2>/dev/null || echo "⚠️ No config examples"

# Test coverage indicators
find . -type d -name "__tests__" -o -name "test" -o -name "tests" 2>/dev/null

Code quality signals (especially for poorly maintained code)

# Large files (complexity indicator)
find . -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" -o -name "*.java" \) \
  -exec wc -l {} \; | sort -rn | head -10

# TODO/FIXME density
find . -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) \
  -exec grep -Hn "TODO\|FIXME\|HACK\|XXX\|BUG" {} \; | head -20

# Commented-out code (smell for technical debt)
find . -name "*.js" | xargs grep -c "^[[:space:]]*\/\/" | \
  awk -F: '$2>50 {print $1 ": " $2 " lines"}' | head -5

# Dead code indicators
find . \( -name "*backup*" -o -name "*old*" -o -name "*temp*" -o -name "*.bak" \) 2>/dev/null

Find entry points

# Server/service entry points
grep -r "listen\|createServer\|app.run\|main()" --include="*.ts" --include="*.js" \
  --include="*.py" --include="*.go" | head -5

# Build/start scripts
cat package.json | grep -A 3 '"scripts"' 2>/dev/null

Output Format: Health Report

# Project Health Check: [PROJECT_NAME]

## Basic Info
- **Type**: [monolith/microservice/library/CLI/frontend app/monorepo]
- **Primary language**: [Language] ([version if detectable])
- **Package manager**: [npm/yarn/pnpm/pip/cargo/...]
- **Build tool**: [webpack/vite/gradle/...]

## Health Score: [X/10]

### ✅ Strengths
- [List positive findings]

### ⚠️ Concerns
- [List medium-priority issues]

### ❌ Critical Issues
- [List serious problems]

### 🔍 Code Quality Indicators
- **TODO/FIXME count**: X occurrences ([top 5 locations with file:line])
- **Large files (>1000 lines)**: [count] files ([top 3: filename:lines])
- **Commented code density**: [high/medium/low]
- **Dead code artifacts**: [list temp/backup files if found]
- **Naming inconsistencies**: [examples if detected]

## One-Line Summary
[Inferred purpose from README and code structure]

## Core Value (3 points)
1. [What problem it solves]
2. [Key user/stakeholder]
3. [Primary capability]

## Entry Points
- **Main file**: `[path:line]`
- **Start command**: `[command]` (from `[source]`)
- **Routes/API**: `[path]` (if applicable)

---
**Next steps**: Choose what to explore:
1. 📄 Quick start guide (run locally in 15 min)
2. 🏗️ Architecture & code map (deep dive)
3. 📊 Specific analysis (pick dimensions)

[WAIT FOR USER CHOICE]

Phase 2: Quick Start Guide

Provide when user wants to run the project immediately or chooses option 1.

# Quick Start Guide: [PROJECT_NAME]

## Prerequisites
```bash
# Version check commands
[language] --version  # Need: [version requirement]
[package_manager] --version

# Installation if missing:
[specific installation commands or links]

Startup Sequence (Target: 15 minutes)

Step 1: Install dependencies

cd [project_path]
[install_command]  # e.g., npm install, pip install -r requirements.txt

# Expected output:
# [sample successful output]

# Common failures:
# ❌ "EACCES permission denied" → Solution: [specific fix]
# ❌ "Cannot find module X" → Solution: [specific fix]

Evidence:

Dependency file: [path]
Lock file: [path] (checksum: [hash])

Step 2: Configure environment

# If .env.example exists:
cp .env.example .env

# Otherwise create manually:
cat > .env << 'EOF'
[required variables with defaults]
EOF

Evidence:

Variables loaded at: [file:line]
Required vars: [list from code analysis]
Optional vars: [list with defaults]

Step 3: Initialize data (if needed)

[migration/seed commands if applicable]

# Verify:
[verification command]

Step 4: Start service

[start command]

# Success indicators:
# [specific log messages to look for]
# Server ready at: http://localhost:[port]

# Verification:
curl [health_check_endpoint] || [alternative verification]
# Expected: [response]

Evidence:

Port config: [file:line]
Startup logs: [file:line where logging configured]

Test Execution

[test command]

# Expected:
# [sample passing output]

# Common failures:
1. **Database not ready**
   Error: `[specific error text]`
   Fix: `[specific command]` ([source: file:line])

2. **Port conflict**
   Error: `EADDRINUSE`
   Fix: `lsof -i :[port] | grep LISTEN | awk '{print $2}' | xargs kill`

Verification Checklist

Service starts without errors
Health endpoint responds (or equivalent test)
Basic test suite passes
Logs show expected initialization
Can access UI/API (if applicable)

Command Reference

Action	Command	Source
Dev mode	`[command]`	`[package.json:line or equivalent]`
Build	`[command]`	`[source]`
Test	`[command]`	`[source]`
Lint	`[command]`	`[source]`

What's next?

Understand architecture → Ask for "architecture analysis"
Trace specific request → Tell me the endpoint/feature
Start coding → Ask for "safe contribution points"

[WAIT FOR USER CHOICE]


### Phase 3: Deep Analysis (On Demand)

Only provide when user explicitly requests. Offer menu of analysis dimensions:

```markdown
Select analysis areas (can choose multiple):

1. 🏗️ **Architecture view** - Modules, dependencies, data flow
2. 🗺️ **Code map** - Directory structure, key files, coupling
3. 📊 **Business logic** - End-to-end flow tracing
4. 💾 **Data model** - Database, schemas, relationships
5. ⚙️ **Configuration** - Environment vars, secrets management
6. 📈 **Observability** - Logging, debugging, monitoring
7. 🧪 **Testing** - Test structure, coverage, strategies
8. 🚀 **CI/CD** - Build pipeline, deployment process
9. 🔒 **Security** - Auth, permissions, vulnerabilities

Input: [numbers] or "describe your need"

Architecture View Template

When user selects architecture analysis:

## Architecture Analysis

### System Topology
```mermaid
graph TB
    Client[Client/Browser]
    API[API Server<br/>[main entry file]]
    DB[(Database)]
    Cache[(Cache)]
    
    Client -->|HTTP| API
    API -->|Query| DB
    API -->|Get/Set| Cache

Evidence:

API entry: [file:line]
DB connection: [file:line]
Cache client: [file:line]

Module Dependencies

graph LR
    Routes[routes/]
    Services[services/]
    Data[data/]
    Utils[utils/]
    
    Routes --> Services
    Services --> Data
    Services --> Utils

Evidence: Import analysis from [grep command used]

Key Modules

Module	Purpose	Key Files	Risks
`[path]`	[responsibility]	`[file]` ([N] lines)	[⚠️ issues if any]

External Dependencies

Type	Service	Config Location	Local Dev Option
Database	[name]	`[env var]` at `[file:line]`	Docker Compose / SQLite
Cache	[name]	`[env var]` at `[file:line]`	Redis local / in-memory

Critical Data Flow: [Example Use Case]

sequenceDiagram
    participant C as Client
    participant A as API
    participant S as Service
    participant D as Database
    
    C->>A: [Request]
    A->>S: [Method call]
    S->>D: [Query]
    D-->>S: [Data]
    S-->>A: [Response]
    A-->>C: [JSON]

Evidence:

Route handler: [file:line-range]
Service method: [file:line-range]
Database query: [file:line-range]

🚨 Architecture Risks

[Risk title]
- Location: [file:line]
- Impact: [description]
- Evidence: [code snippet or pattern]
- Mitigation: [suggestion]

[Continue with other risk areas]


### Phase 4: Action Plans

Provide when user has completed orientation and wants concrete next steps:

```markdown
## Action Plan: [TIME_BUDGET]

### 1-Hour Plan: Run & Understand
**Goal**: Service running + core flow traced

Tasks:
- [ ] Complete environment setup
- [ ] Start service and verify
- [ ] Execute one complete request: [specific example]
  ```bash
  [curl or API call]

Review logs for this request
```
[log location or command]
```

Verification:

[specific verification steps with expected output]

1-Day Plan: Navigate & Modify

Goal: Locate modules + make safe change

Morning: Code navigation drill

Pick API endpoint: [example endpoint]
Trace from route → service → data
- Route: [file:line]
- Service: [file:line]
- Query: [file:line]
Practice change: Add field to response
- Location: [file:function]
- Change: [specific modification]
- Test: [verification]

Afternoon: Fix simple bug Priority targets (low risk):

Documentation typos
Log message improvements
Missing input validation on [specific field]

Find bugs:

grep -rn "TODO\|FIXME" [source_dir]/
git log --grep="bug\|fix" --oneline -20

1-Week Plan: Own Feature

Goal: Complete small feature independently

Knowledge gaps to fill:

[Domain concept] - Read [doc/file]
[Tech stack detail] - Review [framework docs section]
[Test patterns] - Study [test_file]

Reading order:

[entry_file] (15 min) 
  ↓
[routes_file] (30 min - map all endpoints)
  ↓
[services_dir] (1-2 hours - core logic)
  ↓
[data_dir] (30 min - understand persistence)
  ↓
[Choose one module to deep dive] (2 hours)

Recommended first issues:

🟢 Add tests for [uncovered module]
🟢 Improve error messages in [file]
🟡 Implement [simple CRUD endpoint]
🟡 Add caching to [specific query]

Contribution Safety Guide

Code Style (Inferred)

Analysis:

# File naming
find [src] -name "*.ts" | head -20
# Result: [camelCase / kebab-case / mixed]

# Function naming  
grep -rh "^export function\|^export const" [src]/ | head -10
# Pattern: [observed pattern]

Conventions (with evidence):

Files: [naming style] (see [example files])
Functions: [naming style] (see [file:line examples])
Classes: [naming style]

Directory Organization

[project_root]/
├── [dir]/ - [inferred purpose]
├── [dir]/ - [inferred purpose]
└── [dir]/ - [inferred purpose]

Where to add new code:

New API: [directory/file pattern]
New logic: [directory/file pattern]
New data model: [directory/file pattern]

Commit Strategy

Git history analysis:

git log --oneline --format="%s" -50 | head -10

Pattern detected: [conventional commits / freeform / other]

Recommendation:

[type]([scope]): [subject]

Examples:
- feat(api): add user profile endpoint
- fix(auth): handle expired tokens correctly

Common Pitfalls

Issue	Symptom	Cause	Solution
[problem]	[error message]	[root cause at file:line]	[fix command]

Safe Contribution Points (3 examples)

1. Add logging (lowest risk)

Location: [file:function] Change:

// Add at line [N]:
console.log('[Module] Action:', [data]);

Verify: [how to see log] Rollback: Delete line

2. Adjust config (low risk)

Location: [config_file:line] Change: [parameter] from [old] to [new] Verify: [test command] Rollback: Revert value

3. Add validation (medium risk)

Location: [file:function]
Change:

if (![validation]) {
  throw new Error('[message]');
}

Verify: [test with invalid input] Rollback: Remove check

Follow-Up Questions (If Needed)

Ask ONLY if cannot infer from codebase (max 6):

Database migrations: Found [migration_dir] but unclear if manual SQL or ORM-driven. How are migrations run?
Test data: How is test database populated? (no seed scripts found)
Deployment: Is production containerized? (no Dockerfile found, but [evidence of X])
Branch policy: PR review required? (no .github/CODEOWNERS or branch protection visible)
Secrets management: Production secrets via [vault/AWS/other]? (.env not in .gitignore - risky)
Code ownership: Any modules with designated owners?


## Tool Usage Guidelines

### Efficient Scanning

```bash
# Project type identification
ls -la | grep -E "package.json|pom.xml|go.mod|Cargo.toml|requirements.txt"

# Entry point discovery  
grep -r "listen\|createServer\|app.run\|if __name__" \
  --include="*.ts" --include="*.js" --include="*.py"

# Configuration files
find . -maxdepth 2 \( -name "*.config.*" -o -name ".*rc" -o -name ".env*" \)

# Code quality signals
find . -name "*.ts" -exec wc -l {} \; | sort -rn | head -10  # Large files
grep -rn "console.log" --include="*.ts" | wc -l  # Debug statements
grep -rn "http://\|https://" --include="*.ts" | grep -v "^//"  # Hardcoded URLs

File Reading Priority

Must read (Phase 1):

README.md
package.json (or equivalent manifest)
.env.example (if exists)
Main entry file

Important (Phase 2):

Route definitions
Core service/controller files
Config initialization

On-demand (Phase 3):

Specific modules as needed
Test files for patterns
Migration scripts

Handling Large Files

For files >500 lines:

# Read strategically
view(path="large_file.ts", view_range=[1, 50])    # Start
view(path="large_file.ts", view_range=[-50, -1])  # End
# Then middle sections as needed

Evidence Citation Format

Every conclusion must cite source:

File content: path/to/file.ts:45-67 or path:line
Configuration: package.json → scripts.start → "node server.js"
Command output: Result of [bash command]
Inference: "Based on [pattern/structure/naming], likely [conclusion]"

Flag uncertainty: [UNCERTAIN] [reason] - Verify via [command/file]

Output Principles

Progressive disclosure: Deliver in phases, wait for user choice
Evidence-based: All claims cite file:line or command output
Action-oriented: Every section ends with "What's next?"
Risk-aware: Highlight quality issues without judgment
Beginner-friendly: Assume smart but unfamiliar reader

References

See references/ for:

analysis-patterns.md - Templates for different project types
quality-signals.md - Code smell detection patterns
recovery-strategies.md - How to handle missing docs/tests

project-onboarding

Project Onboarding

Interactive Workflow

Phase 0: Context Collection

Phase 1: Project Health Check (5-10 minutes)

Automated Detection Steps

Output Format: Health Report

Phase 2: Quick Start Guide

Startup Sequence (Target: 15 minutes)

Step 1: Install dependencies

Step 2: Configure environment

Step 3: Initialize data (if needed)

Step 4: Start service

Test Execution

Verification Checklist

Command Reference

Architecture View Template

Module Dependencies

Key Modules

External Dependencies

Critical Data Flow: [Example Use Case]

🚨 Architecture Risks

1-Day Plan: Navigate & Modify

1-Week Plan: Own Feature

Contribution Safety Guide

Code Style (Inferred)

Directory Organization

Commit Strategy

Common Pitfalls

Safe Contribution Points (3 examples)

1. Add logging (lowest risk)

2. Adjust config (low risk)

3. Add validation (medium risk)

Follow-Up Questions (If Needed)

File Reading Priority

Handling Large Files

Evidence Citation Format

Output Principles

References

More from mkdir700/myskills

decision-log