Database Schema Designer
Purpose & When-To-Use
Use this skill when you need to:
- Design a new database schema from business requirements
- Refactor an existing schema for better normalization or performance
- Plan a migration strategy between schema versions
- Optimize database structure for specific access patterns
- Choose appropriate data types, indexes, and constraints
Skip this skill if you only need query optimization (use database-optimization-analyzer) or simple CRUD operations.
Pre-Checks
- Verify time context: Compute
NOW_ET= 2025-10-26T12:00:00-04:00 (accessed via NIST/time.gov semantics) - Validate inputs:
- Business requirements describe entities, relationships, and cardinality
- Database type is specified (default: PostgreSQL)
- Scale requirements include estimated row counts and query frequency
- Check database documentation is current (accessed NOW_ET):
- PostgreSQL 16+ docs (https://www.postgresql.org/docs/current/, accessed 2025-10-26)
- MySQL 8.0+ docs (https://dev.mysql.com/doc/, accessed 2025-10-26)
- MongoDB 7.0+ docs (https://docs.mongodb.com/, accessed 2025-10-26)
Procedure
T1: Fast Path (≤2k tokens) - Simple Schema Design
For straightforward domains with 3-8 entities and clear relationships:
-
Entity extraction (100 tokens):
- Identify nouns from requirements → entities
- List attributes with data types
- Note primary keys (natural vs surrogate)
-
Relationship modeling (200 tokens):
- Identify entity relationships (1:1, 1:N, N:M)
- Resolve N:M with junction tables
- Add foreign key constraints
-
Quick normalization (150 tokens):
- Check 1NF: Atomic values, no repeating groups
- Check 2NF: No partial dependencies
- Check 3NF: No transitive dependencies
-
Basic indexes (100 tokens):
- Primary key indexes (automatic)
- Foreign key indexes (recommended)
- Common query column indexes
-
Output (≤1.5k tokens):
- Mermaid ERD
- DDL CREATE TABLE statements
- Basic index creation statements
T2: Standard Path (≤6k tokens) - Production-Ready Schema
For complex domains requiring optimization and migration planning:
-
All T1 steps (550 tokens)
-
Advanced normalization (300 tokens):
- Evaluate BCNF for complex functional dependencies
- Consider controlled denormalization for read performance
- Document normalization decisions and trade-offs
-
Constraint design (400 tokens):
- Check constraints for data validation
- Unique constraints for business rules
- Triggers for complex integrity rules
- Cascade rules for foreign keys (ON DELETE, ON UPDATE)
-
Index optimization (500 tokens):
- Composite indexes for multi-column queries
- Covering indexes for SELECT performance
- Partial indexes for filtered queries
- Full-text search indexes if applicable
-
Migration planning (800 tokens):
- Version N → N+1 migration steps
- Data backfill scripts for new columns
- Rollback procedures for each step
- Zero-downtime migration strategy (if required)
-
Documentation (≤3.5k tokens):
- ERD with cardinality notation
- Complete DDL with comments
- Migration plan with ordering
- Index justification and query patterns
T3: Deep Dive (≤12k tokens) - Enterprise Schema with Partitioning
For large-scale systems requiring partitioning, sharding, or cross-database design:
-
All T2 steps (≤6k tokens)
-
Scalability design (1.5k tokens):
- Table partitioning strategy (range, list, hash)
- Sharding key selection and distribution
- Archive table design for historical data
- Read replicas and query routing
-
Performance analysis (1k tokens):
- Query pattern analysis and index coverage
- EXPLAIN plan review for common queries
- Cardinality estimation and statistics
- Partition pruning verification
-
Data lifecycle (800 tokens):
- Retention policies and TTL implementation
- Archive and purge procedures
- GDPR/compliance considerations (anonymization, deletion)
-
Comprehensive documentation (≤2.7k tokens):
- Full ERD with physical and logical views
- DDL with partitioning and sharding
- Complete migration plan with testing steps
- Performance baseline and monitoring queries
Decision Rules
When to escalate complexity tier:
- T1 → T2: More than 10 entities, OR migration from existing schema, OR explicit performance requirements
- T2 → T3: More than 50 tables, OR partitioning needed, OR multi-region deployment, OR >100M rows expected
When to recommend denormalization:
- Read:write ratio > 100:1 AND query joins >3 tables
- Real-time analytics dashboards requiring <100ms response
- Document explicitly: "Controlled denormalization for performance: [justification]"
When to abort:
- Requirements lack entity definitions or relationships
- Database type unsupported (emit TODO: "Add support for [database]")
- Conflicting constraints detected (e.g., circular foreign keys)
Output Contract
Required fields:
erd_diagram: string # Mermaid ER diagram syntax
ddl_scripts: string # Complete DDL (CREATE TABLE, indexes, constraints)
migration_plan: array # Ordered steps with up/down scripts
optimization_notes: string # Index strategy and query patterns
Optional fields:
normalization_analysis: string # 1NF-3NF evaluation
partitioning_strategy: string # If T3 used
test_data_generator: string # Sample INSERT statements
Format: JSON or YAML document, optionally with embedded SQL code blocks
Examples
-- E-commerce schema: User, Product, Order, OrderItem (T1 example)
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id) ON DELETE RESTRICT,
total DECIMAL(10,2) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(order_id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(product_id) ON DELETE RESTRICT,
quantity INTEGER NOT NULL CHECK (quantity > 0),
price_snapshot DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);
CREATE INDEX idx_orders_user ON orders(user_id, created_at DESC);
CREATE INDEX idx_products_price ON products(price) WHERE stock > 0;
Quality Gates
Token budgets:
- T1: ≤2k tokens (simple 3-8 entity schemas)
- T2: ≤6k tokens (production schemas with migrations)
- T3: ≤12k tokens (enterprise with partitioning/sharding)
Safety checks:
- No hardcoded credentials or sensitive data in examples
- All foreign keys have ON DELETE/ON UPDATE clauses specified
- Indexes justified by query patterns (not speculative)
- Migration steps are reversible (rollback provided)
Validation:
- DDL is syntactically valid for target database
- ERD entities match DDL tables 1:1
- All 3NF violations documented with justification
- Examples use sample/synthetic data only
Determinism:
- Use SERIAL/BIGSERIAL for PostgreSQL auto-increment
- Use AUTO_INCREMENT for MySQL
- Document any database-specific features used
Resources
Official Documentation (accessed 2025-10-26):
- PostgreSQL DDL: https://www.postgresql.org/docs/current/ddl.html
- PostgreSQL Indexes: https://www.postgresql.org/docs/current/indexes.html
- MySQL Data Types: https://dev.mysql.com/doc/refman/8.0/en/data-types.html
- MongoDB Data Modeling: https://docs.mongodb.com/manual/data-modeling/
Best Practices (accessed 2025-10-26):
- Database Normalization Guide: https://www.sqlshack.com/database-normalization-process/
- Index Design Patterns: https://use-the-index-luke.com/
Tools:
- ERD visualization: Mermaid (https://mermaid.js.org/syntax/entityRelationshipDiagram.html)
- Schema diff tools: migra, sqldiff, liquibase
- Migration frameworks: Flyway, Alembic, Liquibase
Related Skills:
database-migration-generator- Generate migration scripts from schema changesdatabase-optimization-analyzer- Analyze and optimize existing schemasdata-pipeline-designer- Design ETL pipelines for data movement
More from williamzujkowski/cognitive-toolworks
ux wireframe designer
Design user experience wireframes, user flows, and interactive mockups for web and mobile applications using industry-standard notation
2rust safety & performance analyzer
Review Rust code for memory safety, concurrency patterns, performance optimization, and ecosystem tooling (cargo, clippy, rustfmt).
1gemini cli delegation & large context routing
Fast delegation of large-context tasks to Gemini CLI via MCP when file size exceeds 100KB or task requires analysis/review.
1multi-cloud cost optimizer
Optimize costs across AWS, GCP, Azure with cross-cloud waste detection, workload placement, commitment balancing, and unified FinOps.
1cryptographic security validator
Validate cryptographic implementations using NIST standards with TLS configuration, cipher suite analysis, and certificate lifecycle checks.
1observability stack configurator
Configure comprehensive observability with metrics, logging, tracing, and alerting using Prometheus, OpenTelemetry, CloudWatch, and Grafana.
1