Legacy Code Summarizer

Analyze and summarize legacy codebases to quickly understand their structure, quality, and improvement opportunities.

Core Capabilities

This skill helps understand legacy code by:

Mapping architecture - Identify key components, layers, and relationships
Analyzing dependencies - Understand module coupling and import patterns
Detecting quality issues - Find code smells, technical debt, and outdated patterns
Assessing test coverage - Identify testing gaps and untested code
Generating documentation - Create actionable summaries for teams

Code Analysis Workflow

Step 1: Survey the Codebase

Get an overview of the project structure and size.

Initial Questions:

What programming language(s)?
What is the project structure?
How large is the codebase?
What frameworks/libraries are used?
Is there existing documentation?

Commands to Run:

# Count lines of code
find . -name "*.py" | xargs wc -l | tail -1  # Python
find . -name "*.java" | xargs wc -l | tail -1  # Java

# Count files
find . -name "*.py" | wc -l
find . -name "*.java" | wc -l

# Directory structure
tree -L 3 -I '__pycache__|node_modules|target|build'

# Or without tree command
find . -type d -not -path '*/\.*' | head -20

Identify Project Type:

Web application (frontend/backend)
CLI tool
Library/framework
Microservice
Monolith
Desktop application

Step 2: Identify Entry Points

Find where execution starts and main workflows.

Common Entry Points:

Python:

# Find main entry points
grep -r "if __name__ == '__main__':" --include="*.py"

# Find Flask/Django apps
grep -r "app = Flask\|application = " --include="*.py"
grep -r "INSTALLED_APPS\|MIDDLEWARE" --include="*.py"

# Find CLI entry points (setup.py, pyproject.toml)
grep -A 10 "entry_points\|console_scripts" setup.py pyproject.toml

Java:

# Find main methods
grep -r "public static void main" --include="*.java"

# Find Spring Boot applications
grep -r "@SpringBootApplication" --include="*.java"

# Find servlets
grep -r "extends HttpServlet\|@WebServlet" --include="*.java"

JavaScript/TypeScript:

# Check package.json for entry points
cat package.json | grep -A 5 "main\|scripts"

# Find Express apps
grep -r "app = express()\|express()" --include="*.js" --include="*.ts"

# Find React entry points
find . -name "index.js" -o -name "index.tsx" -o -name "App.js"

Step 3: Map Architecture and Components

Understand the high-level structure and key modules.

Analyze Directory Structure:

# List top-level directories
ls -d */ | head -20

# Common patterns to look for:
# - src/ or lib/ (source code)
# - tests/ or test/ (test files)
# - config/ (configuration)
# - docs/ (documentation)
# - scripts/ (utility scripts)
# - models/ or entities/ (data models)
# - views/ or templates/ (UI)
# - controllers/ or handlers/ (business logic)
# - services/ or api/ (external services)
# - utils/ or helpers/ (utilities)

Identify Architecture Pattern:

Common patterns in legacy code:

MVC (Model-View-Controller): Django, Rails, Spring MVC
Layered: Presentation → Business → Data layers
Microservices: Multiple small services
Monolith: Single large application
Plugin-based: Core + extensions

See references/architecture_patterns.md for detailed pattern identification.

Create Architecture Diagram:

Example Web Application Architecture:

┌─────────────────────────────────────────┐
│          Frontend (React)               │
│  - components/                          │
│  - pages/                               │
│  - hooks/                               │
└───────────────┬─────────────────────────┘
                │ API Calls
                ↓
┌─────────────────────────────────────────┐
│       API Layer (Flask/Express)         │
│  - routes/                              │
│  - middleware/                          │
└───────────────┬─────────────────────────┘
                │
                ↓
┌─────────────────────────────────────────┐
│       Business Logic                    │
│  - services/                            │
│  - controllers/                         │
└───────────────┬─────────────────────────┘
                │
                ↓
┌─────────────────────────────────────────┐
│       Data Layer                        │
│  - models/                              │
│  - repositories/                        │
└───────────────┬─────────────────────────┘
                │
                ↓
┌─────────────────────────────────────────┐
│       Database (PostgreSQL/MongoDB)      │
└─────────────────────────────────────────┘

Step 4: Analyze Dependencies

Map module relationships and identify coupling issues.

Find Direct Dependencies:

Python:

# Find imports in all Python files
grep -rh "^import \|^from " --include="*.py" | sort | uniq

# Analyze requirements
cat requirements.txt

# Or from setup.py
grep -A 20 "install_requires" setup.py

Java:

# Analyze Maven dependencies
cat pom.xml | grep -A 3 "<dependency>"

# Or Gradle
cat build.gradle | grep -A 3 "implementation\|compile"

# Find imports in code
grep -rh "^import " --include="*.java" | sort | uniq | head -50

JavaScript:

# Analyze package.json
cat package.json | grep -A 50 "dependencies"

# Find imports
grep -rh "^import \|require(" --include="*.js" --include="*.ts" | head -50

Create Dependency Map:

Key Internal Dependencies:

auth module
  ├─ depends on: user_model, database, config
  └─ used by: api_routes, admin_panel

user_model
  ├─ depends on: database, validators
  └─ used by: auth, profile, admin

payment module
  ├─ depends on: user_model, external_api, logger
  └─ used by: checkout, subscription

Circular dependencies detected:
  ⚠️  module_a → module_b → module_c → module_a

See references/dependency_analysis.md for tools and techniques.

Step 5: Identify Code Quality Issues

Detect technical debt, code smells, and improvement opportunities.

Common Quality Issues to Look For:

1. Large Files (God Objects)

# Find files over 500 lines
find . -name "*.py" -exec wc -l {} \; | awk '$1 > 500' | sort -rn

# Find files over 1000 lines (serious issue)
find . -name "*.java" -exec wc -l {} \; | awk '$1 > 1000' | sort -rn

2. Dead Code

# Find unused imports (Python - requires tools)
# Install: pip install autoflake
find . -name "*.py" -exec autoflake --check {} \;

# Find TODO/FIXME comments
grep -rn "TODO\|FIXME\|HACK\|XXX" --include="*.py" --include="*.java"

3. Code Duplication

# Find duplicate code (requires tool)
# Install: pip install pylint
pylint --disable=all --enable=duplicate-code src/

# Or use PMD for Java
# pmd cpd --minimum-tokens 100 --files src/

4. Complex Functions

# Find long functions (crude check - look for large blocks)
# Python: Look for functions with many lines between def and next def
# Java: Look for methods with many lines between { and }

# Use complexity tools for accurate analysis:
# Python: radon cc src/ -a
# Java: Use PMD or Checkstyle

5. Missing Documentation

# Find functions without docstrings (Python)
grep -A 1 "^def " --include="*.py" -r . | grep -v '"""' | grep -v "'''"

# Find classes without documentation (Java)
grep -B 1 "^public class\|^class " --include="*.java" -r . | grep -v "/\*\*" | grep -v "//"

6. Outdated Patterns

Look for:

Python 2 syntax (e.g., print "hello", raw_input())
Java pre-8 patterns (no lambdas, no Optional)
Deprecated libraries
Security vulnerabilities (SQL injection, XSS)

See references/code_quality_checklist.md for comprehensive quality checks.

Step 6: Assess Test Coverage

Identify testing gaps and quality of existing tests.

Find Tests:

# Python tests
find . -name "test_*.py" -o -name "*_test.py"
ls tests/ test/

# Java tests
find . -name "*Test.java" -o -name "*Tests.java"
ls src/test/

# JavaScript tests
find . -name "*.test.js" -o -name "*.spec.js" -o -name "*.test.ts"

Calculate Test Coverage:

Python:

# Install coverage tool
pip install pytest-cov

# Run tests with coverage
pytest --cov=src --cov-report=term-missing

# Generate HTML report
pytest --cov=src --cov-report=html
open htmlcov/index.html

Java:

# Maven with JaCoCo
mvn clean test jacoco:report

# View report
open target/site/jacoco/index.html

JavaScript:

# Jest with coverage
npm test -- --coverage

# View report
open coverage/lcov-report/index.html

Assess Test Quality:

Quality Checklist:
- [ ] Unit tests exist for core business logic
- [ ] Integration tests cover key workflows
- [ ] Tests are readable and maintainable
- [ ] Tests run quickly (< 10 seconds for unit tests)
- [ ] Mocking is used appropriately
- [ ] Edge cases are tested
- [ ] Tests don't depend on external services (or are mocked)
- [ ] Coverage > 70% for critical modules

Step 7: Generate Summary Report

Create actionable documentation for the team.

Summary Template:

# Legacy Codebase Summary: [Project Name]

## Executive Summary

[2-3 sentence overview of what the codebase does]

**Key Metrics:**
- Lines of Code: [X]
- Number of Files: [Y]
- Primary Language: [Language]
- Test Coverage: [Z%]
- Last Major Update: [Date]

## Architecture Overview

### High-Level Structure

[Include architecture diagram from Step 3]

### Key Components

1. **[Component Name]** (`path/to/component/`)
   - **Purpose:** [What it does]
   - **Entry Point:** [Main file/class]
   - **Dependencies:** [Key dependencies]
   - **Lines of Code:** [X]

2. **[Component Name]** (`path/to/component/`)
   - **Purpose:** [What it does]
   - **Entry Point:** [Main file/class]
   - **Dependencies:** [Key dependencies]
   - **Lines of Code:** [X]

[Repeat for 5-10 key components]

### Technology Stack

**Core Technologies:**
- [Language] [Version]
- [Framework] [Version]
- [Database] [Version]

**Key Dependencies:**
- [Library 1] - [Purpose]
- [Library 2] - [Purpose]
- [Library 3] - [Purpose]

## Entry Points and Workflows

### Main Entry Points

1. **[Entry Point Name]** - `path/to/file.py:function()`
   - **Purpose:** [What it does]
   - **Triggered by:** [User action, cron, API call, etc.]

2. **[Entry Point Name]** - `path/to/file.java:main()`
   - **Purpose:** [What it does]
   - **Triggered by:** [How it's invoked]

### Critical Workflows

**Workflow 1: [Name]** (e.g., User Registration)

User submits form → routes/auth.py:register()
Validates input → validators/user_validator.py
Creates user → models/user.py:create()
Sends email → services/email_service.py
Returns response


**Workflow 2: [Name]** (e.g., Payment Processing)

[Step-by-step flow]


## Dependency Analysis

### External Dependencies

**Total Dependencies:** [X]

**Outdated Dependencies (require updates):**
- [Library Name] [Current Version] → [Latest Version]
- [Library Name] [Current Version] → [Latest Version]

**Deprecated Dependencies (require replacement):**
- [Library Name] - Deprecated since [Date]
  - **Suggested Replacement:** [New Library]

### Internal Dependencies

**Highly Coupled Modules (>5 dependencies):**
- `module_a` - depends on [X] modules
- `module_b` - depends on [Y] modules

**Circular Dependencies:**
- ⚠️ `auth` → `user` → `auth`
- ⚠️ `order` → `payment` → `order`

## Code Quality Assessment

### Metrics Summary

- **Average File Size:** [X] lines
- **Largest File:** `path/to/file.py` ([X] lines) ⚠️
- **TODO/FIXME Comments:** [X] occurrences
- **Code Duplication:** [Low/Medium/High]

### Quality Issues

**Critical Issues (Fix Immediately):**
1. **Security Vulnerability:** SQL injection in `path/to/file.py:45`
2. **Large File:** `god_class.java` (2,500 lines) - violates SRP
3. **Circular Dependency:** [Details]

**High Priority (Address Soon):**
1. **No Error Handling:** Missing try/catch in payment module
2. **Hardcoded Credentials:** Found in `config/settings.py`
3. **Deprecated API:** Using old authentication library

**Medium Priority (Technical Debt):**
1. **Code Duplication:** Copy-pasted validation logic in 5 files
2. **Missing Documentation:** 60% of functions lack docstrings
3. **Long Methods:** 15 methods exceed 100 lines

**Low Priority (Improvements):**
1. **Outdated Naming:** Inconsistent variable names
2. **Missing Type Hints:** (Python) or generics (Java)
3. **Verbose Code:** Could be simplified with modern patterns

### Code Smells Detected

- **God Objects:** [List large classes/modules]
- **Feature Envy:** [Methods accessing other objects' data frequently]
- **Dead Code:** [Unused functions/classes]
- **Magic Numbers:** [Hardcoded values without constants]

## Test Coverage Analysis

### Coverage Summary

- **Overall Coverage:** [X%]
- **Critical Modules Coverage:**
  - auth module: [Y%]
  - payment module: [Z%]
  - user management: [W%]

### Testing Gaps

**Untested Critical Code:**
1. `payment/processor.py` - 0% coverage ⚠️
2. `auth/security.py` - 30% coverage
3. `api/routes.py` - 45% coverage

**Missing Test Types:**
- [ ] No integration tests for payment flow
- [ ] No end-to-end tests for user journey
- [ ] No performance/load tests

### Test Quality Issues

- **Slow Tests:** 20 tests take >5 seconds each
- **Flaky Tests:** `test_async_operation` fails intermittently
- **Coupled Tests:** Tests depend on database state

## Recommendations

### Immediate Actions (This Sprint)

1. **Fix Security Issues**
   - Patch SQL injection vulnerability in `auth/login.py`
   - Remove hardcoded credentials, use environment variables

2. **Add Critical Tests**
   - Write integration tests for payment processor
   - Add unit tests for authentication logic

3. **Break Circular Dependencies**
   - Refactor `auth` ↔ `user` circular dependency
   - Extract shared code to new `common` module

### Short-Term Improvements (This Quarter)

1. **Reduce Technical Debt**
   - Refactor `god_class.java` into 3-4 focused classes
   - Eliminate code duplication in validation logic
   - Update deprecated dependencies

2. **Improve Documentation**
   - Add docstrings to all public functions
   - Create architecture diagram
   - Document deployment process

3. **Enhance Test Coverage**
   - Achieve 70% coverage for core modules
   - Add integration tests for critical workflows
   - Set up CI/CD with automated testing

### Long-Term Improvements (This Year)

1. **Architectural Refactoring**
   - Extract microservices for payment and notification
   - Implement proper layering (separate business logic from data access)
   - Introduce dependency injection for better testability

2. **Modernization**
   - Upgrade to [Language] [Latest Version]
   - Adopt modern patterns (async/await, type hints, etc.)
   - Migrate from [Old Framework] to [New Framework]

3. **Quality Infrastructure**
   - Set up automated code quality checks (linting, complexity analysis)
   - Implement pre-commit hooks
   - Add performance monitoring

## Quick Reference

### Key Files to Understand First

1. `path/to/main.py` - Application entry point
2. `path/to/config.py` - Configuration
3. `path/to/models/user.py` - Core data model
4. `path/to/api/routes.py` - API endpoints
5. `path/to/services/auth_service.py` - Authentication logic

### Common Commands

```bash
# Start application
[command]

# Run tests
[command]

# Build for production
[command]

# Deploy
[command]

Key Contacts

Original Authors: [Names/emails if available]
Current Maintainers: [Names/emails]
Documentation: [Links]
Issue Tracker: [URL]

Appendix

Glossary

[Term]: [Definition]
[Term]: [Definition]

External Resources

[Link to original documentation]
[Link to related projects]
[Link to framework docs]


## Summary Output Examples

### Example 1: Small Python Flask App

```markdown
# Legacy Codebase Summary: Internal Dashboard

## Executive Summary

Internal dashboard for monitoring application metrics, built with Flask.
Provides real-time data visualization and alerting for operations team.

**Key Metrics:**
- Lines of Code: 3,500
- Number of Files: 42
- Primary Language: Python 3.7
- Test Coverage: 45%
- Last Major Update: 18 months ago

## Architecture Overview

Simple Flask application with SQLAlchemy ORM and PostgreSQL database.

┌─────────────────┐ │ Flask Routes │ │ (app/routes/) │ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ Services │ │ (app/services/)│ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ Models │ │ (app/models/) │ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ PostgreSQL DB │ └─────────────────┘


### Key Components

1. **Metrics Dashboard** (`app/routes/dashboard.py`)
   - Purpose: Display real-time metrics
   - Entry Point: `dashboard_view()`
   - Dependencies: metrics_service, chart_generator
   - Lines of Code: 250

2. **Data Collection** (`app/services/collector.py`)
   - Purpose: Fetch metrics from external APIs
   - Entry Point: `collect_metrics()` (cron job)
   - Dependencies: requests, database models
   - Lines of Code: 180

3. **Alert System** (`app/services/alerts.py`)
   - Purpose: Send notifications when thresholds exceeded
   - Entry Point: `check_alerts()` (background task)
   - Dependencies: email_service, metrics_service
   - Lines of Code: 150

## Recommendations

### Immediate Actions
1. Update Flask to latest version (security patches)
2. Add tests for alert system (currently 0% coverage)
3. Fix hardcoded database credentials

### Short-Term
1. Increase test coverage to 70%
2. Add API documentation
3. Refactor large dashboard route (300+ lines)

Example 2: Large Java Spring Application

# Legacy Codebase Summary: E-Commerce Platform

## Executive Summary

Full-featured e-commerce platform handling product catalog, orders, payments,
and customer management. Serves 100K+ daily active users.

**Key Metrics:**
- Lines of Code: 185,000
- Number of Files: 1,240
- Primary Language: Java 8
- Test Coverage: 62%
- Last Major Update: 6 months ago

## Architecture Overview

Layered Spring Boot application with microservice patterns emerging.

[Detailed architecture diagram showing layers]

### Critical Issues Identified

**High Priority:**
1. **Memory Leak:** Order processing service shows increasing heap usage
2. **N+1 Query Problem:** Product listing generates 500+ DB queries
3. **No Monitoring:** Missing APM tools for production

**Modernization Opportunities:**
1. Migrate to Java 17 (LTS)
2. Extract payment service as microservice
3. Implement caching layer (Redis)

## Recommendations

[Detailed phased approach to refactoring]

Best Practices

Start broad, then narrow - Overview first, details second
Focus on actionable insights - Prioritize what can be improved
Use visual aids - Diagrams clarify complex relationships
Prioritize by risk - Security and stability issues first
Be specific - Point to exact files and line numbers
Estimate effort - Help teams plan refactoring work
Document assumptions - Note what analysis couldn't determine
Update regularly - Re-analyze as code evolves

Resources

references/architecture_patterns.md - Common architectural patterns in legacy systems and how to identify them
references/dependency_analysis.md - Tools and techniques for analyzing module dependencies and coupling
references/code_quality_checklist.md - Comprehensive checklist for assessing code quality and technical debt

Quick Reference

Task	Command/Approach
Count LOC	`find . -name "*.py" \| xargs wc -l`
Find entry points	`grep -r "if __name__ == '__main__'"`
Analyze imports	`grep -rh "^import \|^from " \| sort \| uniq`
Find large files	`find . -name "*.py" -exec wc -l {} \\; \| sort -rn`
Test coverage	`pytest --cov=src --cov-report=term`
Find TODOs	`grep -rn "TODO\|FIXME"`

legacy-code-summarizer

Legacy Code Summarizer

Core Capabilities

Code Analysis Workflow

Step 1: Survey the Codebase

Step 2: Identify Entry Points

Step 3: Map Architecture and Components

Step 4: Analyze Dependencies

Step 5: Identify Code Quality Issues

Step 6: Assess Test Coverage

Step 7: Generate Summary Report

Key Contacts

Appendix

Glossary

External Resources

Example 2: Large Java Spring Application

Best Practices

Resources

Quick Reference