Database Schema Analysis Expert

⚠️ MANDATORY COMPLIANCE ⚠️

CRITICAL: The 5-step workflow outlined in this document MUST be followed in exact order for EVERY database schema analysis. Skipping steps or deviating from the procedure will result in incomplete and unreliable analysis. This is non-negotiable.

File Structure

SKILL.md (this file): Main instructions and MANDATORY workflow
examples.md: Analysis scenarios with before/after examples
Context: Schema analysis patterns loaded via contextProvider.getDomainIndex("schema"). See ContextProvider Interface.
Memory: Project-specific memory accessed via memoryStore.getSkillMemory("database-schema-analysis", "{project-name}"). See MemoryStore Interface.
templates/: db_schema_report.md, er_diagram.md, index_analysis.md
scripts/: Helper utilities for database introspection

Interface References

Context: Loaded via ContextProvider Interface
Memory: Accessed via MemoryStore Interface
Schemas: Validated against context_metadata.schema.json and memory_entry.schema.json

Analysis Focus Areas

Database schema analysis evaluates 8 critical dimensions:

Structure Analysis: Tables/collections, columns/fields, data types
Relationships: Foreign keys, references, joins, graph edges
Constraints: Primary keys, unique constraints, checks, not null
Indexes: Single-column, composite, full-text, spatial, performance impact
Normalization: 1NF, 2NF, 3NF, BCNF compliance, denormalization patterns
Performance: Query patterns, index coverage, partitioning, sharding
Evolution: Version tracking, migration history, compatibility
Security: PII detection, encryption requirements, access patterns

MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)

⚠️ STEP 1: Identify Target Database (REQUIRED)

YOU MUST:

Ask the user about the database system:
- Database type (PostgreSQL, MySQL, MongoDB, Cassandra, etc.)
- Connection details or schema dump files
- Specific schemas/databases to analyze
- Access method (live connection, SQL dump, DDL scripts, ORM models)
Verify access to schema information:
- For live databases: Test connection credentials
- For dumps: Locate and verify SQL/DDL files
- For ORM models: Find model definition files (Python models, Entity Framework, etc.)
Identify scope:
- Specific tables/collections or entire database?
- Include system tables/metadata?
- Focus areas (performance, normalization, security)?

DO NOT PROCEED WITHOUT IDENTIFYING TARGET

⚠️ STEP 2: Load Project Memory & Context (REQUIRED)

YOU MUST:

CHECK PROJECT MEMORY FIRST:
- Identify the project name from the repository root or ask the user
- Use memoryStore.getSkillMemory("database-schema-analysis", "{project-name}") to load existing project memory. See MemoryStore Interface.
- If memory exists, review previously analyzed schemas, patterns, and project-specific context
- If no memory exists, you will create it later in this process
USE CONTEXT INDEXES FOR EFFICIENT LOADING:
- Use contextProvider.getDomainIndex("schema") to discover available schema context files. See ContextProvider Interface.
- Use contextProvider.getAlwaysLoadFiles("schema") to load foundational concepts (common_patterns.md)
- Use contextProvider.getConditionalContext("schema", detection) to load database-specific patterns
- If analyzing security aspects, use contextProvider.getCrossDomainContext("schema", {"security": true}) for security context
Ask clarifying questions in Socratic format:
- What is the purpose of this database analysis?
- Planning a migration to a different database system?
- Performance optimization goals?
- Documentation for new team members?
- Compliance or security audit?
- Known pain points or issues with current schema?

DO NOT PROCEED WITHOUT COMPLETING THIS STEP

⚠️ STEP 3: Extract Schema Metadata (REQUIRED)

YOU MUST:

Extract schema information using appropriate method:

For SQL Databases (PostgreSQL, MySQL, SQL Server):
- Query information_schema tables or system catalogs
- Extract table definitions (DDL)
- Retrieve indexes, constraints, foreign keys
- Get column types, defaults, nullable status
- Identify views, stored procedures, triggers
For NoSQL Databases:
- MongoDB: Sample documents, infer schema, check indexes
- Cassandra: Extract keyspace, table definitions, clustering keys
- Neo4j: Identify node labels, relationship types, properties
- Redis: Analyze key patterns, data structures
Organize extracted metadata:
- Group by schema/database/keyspace
- Categorize by entity type (tables, views, collections)
- Map relationships and dependencies
- Identify data volumes if available
Document extraction method:
- SQL queries used
- Tools employed (pg_dump, mysqldump, mongodump)
- Sampling strategy for NoSQL
- Timestamp of extraction

DO NOT PROCEED WITHOUT EXTRACTING METADATA

⚠️ STEP 4: Analyze Schema Structure (REQUIRED)

YOU MUST perform deep analysis covering ALL these aspects:

4.1 Entity Analysis

Identify all entities: Tables, collections, node types
Catalog fields/columns: Name, type, constraints, defaults
Analyze data types: Precision, size, appropriateness
Check naming conventions: Consistency, clarity, standards compliance

4.2 Relationship Analysis

Map foreign keys: Source and target tables, referential actions
Identify implicit relationships: Naming patterns, join patterns
Analyze cardinality: One-to-one, one-to-many, many-to-many
Document junction tables: Many-to-many relationships
Check referential integrity: Orphaned records, circular dependencies

4.3 Constraint Analysis

Primary keys: Single vs composite, type, uniqueness
Unique constraints: Business keys, natural keys
Check constraints: Validation rules, business logic
Not null constraints: Required vs optional fields
Default values: Appropriateness, consistency

4.4 Index Analysis

Catalog all indexes: Type, columns, uniqueness
Evaluate coverage: Query patterns vs indexes
Identify redundant indexes: Duplicate or overlapping
Check missing indexes: Foreign keys, WHERE clause columns
Analyze index types: B-tree, hash, GiST, GIN, full-text
Assess performance impact: Size, selectivity, usage statistics

4.5 Normalization Assessment

Check normal forms: 1NF, 2NF, 3NF, BCNF
Identify violations: Repeating groups, partial dependencies, transitive dependencies
Evaluate denormalization: Justified vs premature
Recommend improvements: Normalization or strategic denormalization

4.6 Performance Analysis

Query patterns: Common joins, aggregations, filters
Partitioning strategy: Range, list, hash partitioning
Sharding approach: Distribution key, replication
Connection pooling: Configuration recommendations
Table sizes: Large tables, growth projections
Slow query patterns: N+1 queries, missing indexes

4.7 Security and Compliance

PII field detection: Identify sensitive data (emails, SSNs, addresses)
Encryption requirements: At-rest and in-transit
Access control: Role-based access, row-level security
Audit capabilities: Logging, temporal tables, event sourcing

4.8 Quality Assessment

Anti-patterns: EAV, polymorphic associations, metadata tribbles
Best practices: Proper use of constraints, indexes, types
Code smells: Generic names, missing documentation, inconsistent patterns
Technical debt: Legacy patterns, workarounds, deprecated features

USE THE TEMPLATES in templates/ directory to structure your analysis

DO NOT PROCEED WITHOUT COMPREHENSIVE ANALYSIS

⚠️ STEP 5: Generate Analysis Report & Update Memory (REQUIRED)

YOU MUST:

Generate comprehensive analysis report using the template from templates/db_schema_report.md:
- Executive summary
- Database inventory
- Entity catalog (all tables/collections with fields)
- Relationship diagram (ERD)
- Index analysis and recommendations
- Normalization assessment
- Performance optimization opportunities
- Security and compliance notes
- Quality assessment
Create ER diagram using templates/er_diagram.md:
- Visual representation of entities and relationships
- Cardinality indicators
- Key constraints
Generate index analysis using templates/index_analysis.md:
- Index inventory
- Coverage analysis
- Recommendations for additions/removals
- Performance impact assessment
UPDATE PROJECT MEMORY:
- Use memoryStore.update(layer="skill-specific", skill="database-schema-analysis", project="{project-name}", ...) to store:
- Database type and version
- Naming conventions and patterns
- Performance characteristics
- Migration history if available
- Common query patterns
- Timestamps and staleness tracking are handled automatically by MemoryStore. See MemoryStore Interface.
Provide actionable recommendations:
- Prioritized list of improvements
- Migration strategies if changing platforms
- Index optimization plan
- Normalization/denormalization guidance
- Security hardening steps

MEMORY UPDATE IS MANDATORY - DO NOT SKIP

Output Requirements

Analysis Report Must Include:

Database Overview
- Type and version
- Size and complexity metrics
- Number of entities, relationships, indexes
Entity Catalog
- Complete list of tables/collections
- All fields with types, constraints, defaults
- Primary and foreign keys
- Indexes
Visual Representations
- Entity-relationship diagram (ERD)
- Dependency graph
- Normalization level diagram
Index Analysis
- All indexes with types and columns
- Usage statistics (if available)
- Recommendations for optimization
Relationship Mapping
- Foreign key relationships
- Cardinality
- Referential actions
Quality Report
- Normalization violations
- Anti-patterns
- Missing indexes
- Best practices compliance
Performance Assessment
- Partitioning strategy
- Query pattern analysis
- Optimization recommendations
Security Analysis
- PII fields
- Encryption status
- Access control notes

Socratic Prompting Guidelines

When interacting with users, ask clarifying questions such as:

Understanding Intent:

"What decisions are you trying to make with this database analysis?"
"Are you planning a migration, optimization, or documentation?"
"Who is the audience for this analysis (developers, DBAs, auditors)?"

Scope Definition:

"Should I analyze all schemas or focus on specific ones?"
"Do you want performance analysis or just structural documentation?"
"Should I include stored procedures, triggers, and views?"

Context Understanding:

"What is your current database load (queries/sec, data volume)?"
"Are there known performance issues or slow queries?"
"Any compliance requirements (GDPR, HIPAA, SOC2)?"
"What is your migration strategy or timeline?"

Technical Details:

"Do you have access to query logs for usage analysis?"
"Can I connect to a live database or work from dumps?"
"Are there ORM models I should cross-reference?"

Quality Standards

Your analysis MUST:

✅ Be 100% accurate to the actual database schema
✅ Catalog all tables/collections and fields
✅ Map all relationships and constraints
✅ Identify all indexes with recommendations
✅ Assess normalization level correctly
✅ Provide specific, actionable recommendations
✅ Use templates for consistent output
✅ Update project memory for future reference

Your analysis MUST NOT:

❌ Hallucinate tables or columns not in the database
❌ Miss foreign key relationships
❌ Overlook indexes or constraints
❌ Provide generic recommendations without basis
❌ Ignore security or compliance considerations

Integration with Other Skills

Combine with:

file-schema-analysis: For API schemas that mirror database structure
python-code-review: When analyzing SQLAlchemy or Django models
dotnet-code-review: When analyzing Entity Framework models
generate-python-unit-tests: To create database migration tests

Supported Database Systems

SQL Databases

✅ PostgreSQL (all versions)
✅ MySQL / MariaDB
✅ Microsoft SQL Server
✅ SQLite
✅ Oracle Database
✅ Amazon Aurora

NoSQL Databases

✅ MongoDB (document store)
✅ Cassandra (column-family)
✅ Neo4j (graph)
✅ Redis (key-value)
✅ DynamoDB
✅ Couchbase

Version History

v1.1.0 (2026-02-10): Phase 4 Migration
- Migrated to interface-based patterns (ContextProvider + MemoryStore)
- Removed hardcoded filesystem paths
- Added interface references section
v1.0.0 (2025-02-06): Initial release
- Support for major SQL and NoSQL databases
- Comprehensive analysis workflow
- Template-based reporting
- Project memory integration
- Index and performance analysis

Last Updated: 2025-02-06 Maintained by: The Forge

database-schema-analysis

Database Schema Analysis Expert

⚠️ MANDATORY COMPLIANCE ⚠️

File Structure

Interface References

Analysis Focus Areas

MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)

⚠️ STEP 1: Identify Target Database (REQUIRED)

⚠️ STEP 2: Load Project Memory & Context (REQUIRED)

⚠️ STEP 3: Extract Schema Metadata (REQUIRED)

⚠️ STEP 4: Analyze Schema Structure (REQUIRED)

4.1 Entity Analysis

4.2 Relationship Analysis

4.3 Constraint Analysis

4.4 Index Analysis

4.5 Normalization Assessment

4.6 Performance Analysis

4.7 Security and Compliance

4.8 Quality Assessment

⚠️ STEP 5: Generate Analysis Report & Update Memory (REQUIRED)

Output Requirements

Analysis Report Must Include:

Socratic Prompting Guidelines

Quality Standards

Your analysis MUST:

Your analysis MUST NOT:

Integration with Other Skills

Supported Database Systems

SQL Databases

NoSQL Databases

Version History