database-schema-design
Database Schema Design
When to use this skill
Lists specific situations where this skill should be triggered:
- New Project: Database schema design for a new application
- Schema Refactoring: Redesigning an existing schema for performance or scalability
- Relationship Definition: Implementing 1:1, 1:N, N:M relationships between tables
- Migration: Safely applying schema changes
- Performance Issues: Index and schema optimization to resolve slow queries
Input Format
The required and optional input information to collect from the user:
Required Information
- Database Type: PostgreSQL, MySQL, MongoDB, SQLite, etc.
- Domain Description: What data will be stored (e.g., e-commerce, blog, social media)
- Key Entities: Core data objects (e.g., User, Product, Order)
Optional Information
- Expected Data Volume: Small (<10K rows), Medium (10K-1M), Large (>1M) (default: Medium)
- Read/Write Ratio: Read-heavy, Write-heavy, Balanced (default: Balanced)
- Transaction Requirements: Whether ACID is required (default: true)
- Sharding/Partitioning: Whether large data distribution is needed (default: false)
Input Example
Design a database for an e-commerce platform:
- DB: PostgreSQL
- Entities: User, Product, Order, Review
- Relationships:
- A User can have multiple Orders
- An Order contains multiple Products (N:M)
- A Review is linked to a User and a Product
- Expected data: 100,000 users, 10,000 products
- Read-heavy (frequent product lookups)
Instructions
Specifies the step-by-step task sequence to follow precisely.
Step 1: Define Entities and Attributes
Identify core data objects and their attributes.
Tasks:
- Extract nouns from business requirements → entities
- List each entity's attributes (columns)
- Determine data types (VARCHAR, INTEGER, TIMESTAMP, JSON, etc.)
- Designate Primary Keys (UUID vs Auto-increment ID)
Example (E-commerce):
Users
- id: UUID PRIMARY KEY
- email: VARCHAR(255) UNIQUE NOT NULL
- username: VARCHAR(50) UNIQUE NOT NULL
- password_hash: VARCHAR(255) NOT NULL
- created_at: TIMESTAMP DEFAULT NOW()
- updated_at: TIMESTAMP DEFAULT NOW()
Products
- id: UUID PRIMARY KEY
- name: VARCHAR(255) NOT NULL
- description: TEXT
- price: DECIMAL(10, 2) NOT NULL
- stock: INTEGER DEFAULT 0
- category_id: UUID REFERENCES Categories(id)
- created_at: TIMESTAMP DEFAULT NOW()
Orders
- id: UUID PRIMARY KEY
- user_id: UUID REFERENCES Users(id)
- total_amount: DECIMAL(10, 2) NOT NULL
- status: VARCHAR(20) DEFAULT 'pending'
- created_at: TIMESTAMP DEFAULT NOW()
OrderItems (Junction table)
- id: UUID PRIMARY KEY
- order_id: UUID REFERENCES Orders(id) ON DELETE CASCADE
- product_id: UUID REFERENCES Products(id)
- quantity: INTEGER NOT NULL
- price: DECIMAL(10, 2) NOT NULL
Step 2: Design Relationships and Normalization
Define relationships between tables and apply normalization.
Tasks:
- 1:1 relationship: Foreign Key + UNIQUE constraint
- 1:N relationship: Foreign Key
- N:M relationship: Create junction table
- Determine normalization level (1NF ~ 3NF)
Decision Criteria:
- OLTP systems → normalize to 3NF (data integrity)
- OLAP/analytics systems → denormalization allowed (query performance)
- Read-heavy → minimize JOINs with partial denormalization
- Write-heavy → full normalization to eliminate redundancy
Example (ERD Mermaid):
erDiagram
Users ||--o{ Orders : places
Orders ||--|{ OrderItems : contains
Products ||--o{ OrderItems : "ordered in"
Categories ||--o{ Products : categorizes
Users ||--o{ Reviews : writes
Products ||--o{ Reviews : "reviewed by"
Users {
uuid id PK
string email UK
string username UK
string password_hash
timestamp created_at
}
Products {
uuid id PK
string name
decimal price
int stock
uuid category_id FK
}
Orders {
uuid id PK
uuid user_id FK
decimal total_amount
string status
timestamp created_at
}
OrderItems {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal price
}
Step 3: Establish Indexing Strategy
Design indexes for query performance.
Tasks:
- Primary Keys automatically create indexes
- Columns frequently used in WHERE clauses → add indexes
- Foreign Keys used in JOINs → indexes
- Consider composite indexes (WHERE col1 = ? AND col2 = ?)
- UNIQUE indexes (email, username, etc.)
Checklist:
- Indexes on frequently queried columns
- Indexes on Foreign Key columns
- Composite index order optimized (high selectivity columns first)
- Avoid excessive indexes (degrades INSERT/UPDATE performance)
Example (PostgreSQL):
-- Primary Keys (auto-indexed)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL, -- UNIQUE = auto-indexed
username VARCHAR(50) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Foreign Keys + explicit indexes
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
total_amount DECIMAL(10, 2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Composite index (status and created_at frequently queried together)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Products table
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0),
category_id UUID REFERENCES categories(id),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_price ON products(price); -- price range search
CREATE INDEX idx_products_name ON products(name); -- product name search
-- Full-text search (PostgreSQL)
CREATE INDEX idx_products_name_fts ON products USING GIN(to_tsvector('english', name));
CREATE INDEX idx_products_description_fts ON products USING GIN(to_tsvector('english', description));
Step 4: Set Up Constraints and Triggers
Add constraints to ensure data integrity.
Tasks:
- NOT NULL: required columns
- UNIQUE: columns that must be unique
- CHECK: value range constraints (e.g., price >= 0)
- Foreign Key + CASCADE option
- Set default values
Example:
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0),
discount_percent INTEGER CHECK (discount_percent >= 0 AND discount_percent <= 100),
category_id UUID REFERENCES categories(id) ON DELETE SET NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Trigger: auto-update updated_at
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_products_updated_at
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
Step 5: Write Migration Scripts
Write migrations that safely apply schema changes.
Tasks:
- UP migration: apply changes
- DOWN migration: rollback
- Wrap in transactions
- Prevent data loss (use ALTER TABLE carefully)
Example (SQL migration):
-- migrations/001_create_initial_schema.up.sql
BEGIN;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE categories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) UNIQUE NOT NULL,
parent_id UUID REFERENCES categories(id)
);
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0),
category_id UUID REFERENCES categories(id),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_price ON products(price);
COMMIT;
-- migrations/001_create_initial_schema.down.sql
BEGIN;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS users CASCADE;
COMMIT;
Output format
Defines the exact format that deliverables should follow.
Basic Structure
project/
├── database/
│ ├── schema.sql # full schema
│ ├── migrations/
│ │ ├── 001_create_users.up.sql
│ │ ├── 001_create_users.down.sql
│ │ ├── 002_create_products.up.sql
│ │ └── 002_create_products.down.sql
│ ├── seeds/
│ │ └── sample_data.sql # test data
│ └── docs/
│ ├── ERD.md # Mermaid ERD diagram
│ └── SCHEMA.md # schema documentation
└── README.md
ERD Diagram (Mermaid Format)
# Database Schema
## Entity Relationship Diagram
\`\`\`mermaid
erDiagram
Users ||--o{ Orders : places
Orders ||--|{ OrderItems : contains
Products ||--o{ OrderItems : "ordered in"
Users {
uuid id PK
string email UK
string username UK
}
Products {
uuid id PK
string name
decimal price
}
\`\`\`
## Table Descriptions
### users
- **Purpose**: Store user account information
- **Indexes**: email, username
- **Estimated rows**: 100,000
### products
- **Purpose**: Product catalog
- **Indexes**: category_id, price, name
- **Estimated rows**: 10,000
Constraints
Specifies mandatory rules and prohibited actions.
Mandatory Rules (MUST)
-
Primary Key Required: Define a Primary Key on every table
- Unique record identification
- Ensures referential integrity
-
Explicit Foreign Keys: Tables with relationships must define Foreign Keys
- Specify ON DELETE CASCADE/SET NULL options
- Prevent orphan records
-
Use NOT NULL Appropriately: Required columns must be NOT NULL
- Clearly specify nullable vs. non-nullable
- Providing defaults is recommended
Prohibited Actions (MUST NOT)
-
Avoid EAV Pattern Abuse: Use the Entity-Attribute-Value pattern only in special cases
- Query complexity increases dramatically
- Performance degradation
-
Excessive Denormalization: Be careful when denormalizing for performance
- Data consistency issues
- Risk of update anomalies
-
No Plaintext Storage of Sensitive Data: Never store passwords, card numbers, etc. in plaintext
- Hashing/encryption is mandatory
- Legal liability issues
Security Rules
- Principle of Least Privilege: Grant only the necessary permissions to application DB accounts
- SQL Injection Prevention: Use Prepared Statements / Parameterized Queries
- Encrypt Sensitive Columns: Consider encrypting personally identifiable information at rest
Examples
Demonstrates how to apply the skill through real-world use cases.
Example 1: Blog Platform Schema
Situation: Database design for a Medium-style blog platform
User Request:
Design a PostgreSQL schema for a blog platform:
- Users can write multiple posts
- Posts can have multiple tags (N:M)
- Users can like and bookmark posts
- Comment feature (with nested replies)
Final Result:
-- Users
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL,
bio TEXT,
avatar_url VARCHAR(500),
created_at TIMESTAMP DEFAULT NOW()
);
-- Posts
CREATE TABLE posts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
content TEXT NOT NULL,
published_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_posts_author ON posts(author_id);
CREATE INDEX idx_posts_published ON posts(published_at);
CREATE INDEX idx_posts_slug ON posts(slug);
-- Tags
CREATE TABLE tags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(50) UNIQUE NOT NULL,
slug VARCHAR(50) UNIQUE NOT NULL
);
-- Post-Tag relationship (N:M)
CREATE TABLE post_tags (
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
tag_id UUID REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, tag_id)
);
CREATE INDEX idx_post_tags_post ON post_tags(post_id);
CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);
-- Likes
CREATE TABLE post_likes (
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, post_id)
);
-- Bookmarks
CREATE TABLE post_bookmarks (
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, post_id)
);
-- Comments (self-referencing for nested comments)
CREATE TABLE comments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
parent_comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_author ON comments(author_id);
CREATE INDEX idx_comments_parent ON comments(parent_comment_id);
Example 2: MongoDB Schema (NoSQL)
Situation: MongoDB schema for a real-time chat app
User Request:
Design a MongoDB schema for a real-time chat app.
Reads are very frequent, and message history needs to be retrieved quickly.
Final Result:
// users collection
{
_id: ObjectId,
username: String, // indexed, unique
email: String, // indexed, unique
avatar_url: String,
status: String, // 'online', 'offline', 'away'
last_seen: Date,
created_at: Date
}
// conversations collection (denormalized - read-optimized)
{
_id: ObjectId,
participants: [ // indexed
{
user_id: ObjectId,
username: String,
avatar_url: String
}
],
last_message: { // denormalized for fast recent-message retrieval
content: String,
sender_id: ObjectId,
sent_at: Date
},
unread_counts: { // unread message count per participant
"user_id_1": 5,
"user_id_2": 0
},
created_at: Date,
updated_at: Date
}
// messages collection
{
_id: ObjectId,
conversation_id: ObjectId, // indexed
sender_id: ObjectId,
content: String,
attachments: [
{
type: String, // 'image', 'file', 'video'
url: String,
filename: String
}
],
read_by: [ObjectId], // array of user IDs who have read the message
sent_at: Date, // indexed
edited_at: Date
}
// Indexes
db.users.createIndex({ username: 1 }, { unique: true });
db.users.createIndex({ email: 1 }, { unique: true });
db.conversations.createIndex({ "participants.user_id": 1 });
db.conversations.createIndex({ updated_at: -1 });
db.messages.createIndex({ conversation_id: 1, sent_at: -1 });
db.messages.createIndex({ sender_id: 1 });
Design Highlights:
- Denormalization for read optimization (embedding last_message)
- Indexes on frequently accessed fields
- Using array fields (participants, read_by)
Best practices
Quality Improvement
-
Naming Convention Consistency: Use snake_case for table/column names
- users, post_tags, created_at
- Be consistent with plurals/singulars (tables plural, columns singular, etc.)
-
Consider Soft Delete: Use logical deletion instead of physical deletion for important data
- deleted_at TIMESTAMP (NULL = active, NOT NULL = deleted)
- Allows recovery of accidentally deleted data
- Audit trail
-
Timestamps Required: Include created_at and updated_at in most tables
- Data tracking and debugging
- Time-series analysis
Efficiency Improvements
- Partial Indexes: Minimize index size with conditional indexes
CREATE INDEX idx_posts_published ON posts(published_at) WHERE published_at IS NOT NULL; - Materialized Views: Cache complex aggregate queries as Materialized Views
- Partitioning: Partition large tables by date/range
Common Issues
Issue 1: N+1 Query Problem
Symptom: Multiple DB calls when a single query would suffice
Cause: Individual lookups in a loop without JOINs
Solution:
-- ❌ Bad example: N+1 queries
SELECT * FROM posts; -- 1 time
-- for each post
SELECT * FROM users WHERE id = ?; -- N times
-- ✅ Good example: 1 query
SELECT posts.*, users.username, users.avatar_url
FROM posts
JOIN users ON posts.author_id = users.id;
Issue 2: Slow JOINs Due to Unindexed Foreign Keys
Symptom: JOIN queries are very slow
Cause: Missing index on Foreign Key column
Solution:
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
Issue 3: UUID vs Auto-increment Performance
Symptom: Insert performance degradation when using UUID Primary Keys
Cause: UUIDs are random, causing index fragmentation
Solution:
- PostgreSQL: Use
uuid_generate_v7()(time-ordered UUID) - MySQL: Use
UUID_TO_BIN(UUID(), 1) - Or consider using Auto-increment BIGINT
References
Official Documentation
Tools
- dbdiagram.io - ERD diagram creation
- PgModeler - PostgreSQL modeling tool
- Prisma - ORM + migrations
Learning Resources
- Database Design Course (freecodecamp)
- Use The Index, Luke - SQL indexing guide
Metadata
Version
- Current Version: 1.0.0
- Last Updated: 2025-01-01
- Compatible Platforms: Claude, ChatGPT, Gemini
Related Skills
- api-design: Schema design alongside API design
- performance-optimization: Query performance optimization
Tags
#database #schema #PostgreSQL #MySQL #MongoDB #SQL #NoSQL #migration #ERD