Database Migration

Overview

Write safe, reversible database migrations that won't take down production. Every migration gets a rollback plan, data backup commands, lock impact analysis, verification queries, and a deployment runbook. The core principle: if you can't reverse it safely, don't ship it.

When to use

The task changes an existing database schema, constraints, indexes, or stored data through a migration.
The user needs forward and rollback steps, safety analysis, verification queries, or deployment guidance.
The migration must be safe for shared, staging, or production environments.
The request involves ALTER TABLE, backfills tied to schema evolution, or zero-downtime rollout concerns.

Do NOT use when:

The task is designing a schema from scratch with no migration path.
The request is ad hoc application SQL rather than a migration artifact.
The work is a pure data script or ETL job with no schema-change lifecycle.

Response format

Always structure the final response with these top-level sections, in this order:

Summary — state the task, scope, and main conclusion in 1-3 sentences.
Decision / Approach — state the key classification, assumptions, or chosen path.
Artifacts — provide the primary deliverable(s) for this skill. Use clear subheadings for multiple files, commands, JSON payloads, queries, or documents.
Validation — state checks performed, important risks, caveats, or unresolved questions.
Next steps — list concrete follow-up actions, or write None if nothing remains.

Rules:

Do not omit a section; write None when a section does not apply.
If files are produced, list each file path under Artifacts before its contents.
If commands, JSON, SQL, YAML, or code are produced, put each artifact in fenced code blocks with the correct language tag when possible.
Keep section names exactly as written above so output stays predictable across skills.

Workflow

1. Analyze the Schema Change

Before writing any SQL, understand the full impact:

What tables are affected? Check row counts — a 100-row config table and a 100M-row events table need completely different strategies.
What columns are involved? Nullable vs NOT NULL, defaults, data types.
Who reads/writes this table? Identify application code, background jobs, and reporting queries that touch the affected columns.
What database engine? PostgreSQL, MySQL, and SQLite have different locking behaviors. Ask if not obvious.

Output: A brief impact summary stating table(s), estimated row count, and affected consumers.

2. Assess Lock and Downtime Impact

Every ALTER TABLE acquires some form of lock. Classify the impact:

Operation (PostgreSQL)	Lock Level	Downtime Risk
ADD COLUMN (nullable, no default)	ACCESS EXCLUSIVE	Instant
ADD COLUMN (with volatile default)	ACCESS EXCLUSIVE	Blocks all
ADD COLUMN (with constant default, 11+)	ACCESS EXCLUSIVE	Instant
DROP COLUMN	ACCESS EXCLUSIVE	Instant
RENAME COLUMN	ACCESS EXCLUSIVE	Instant
ALTER COLUMN TYPE	ACCESS EXCLUSIVE	Rewrites table
CREATE INDEX	SHARE lock	Blocks writes
CREATE INDEX CONCURRENTLY	SHARE UPDATE EXCL.	Non-blocking
ADD NOT NULL constraint	ACCESS EXCLUSIVE	Scans table
ADD CHECK constraint (NOT VALID)	ACCESS EXCLUSIVE	Instant
VALIDATE CONSTRAINT	SHARE UPDATE EXCL.	Non-blocking

For MySQL, note that many ALTERs require a full table copy. Use pt-online-schema-change or gh-ost for large tables.

Output: Lock classification and estimated duration for the table size.

3. Write the Up Migration

Write the forward migration SQL. Apply these safety rules:

Set a lock timeout to avoid blocking the entire application:
```
SET lock_timeout = '5s';
```
Use transactions for DDL that supports it (PostgreSQL does, MySQL doesn't for most DDL).
Split large operations into safe steps. Never combine a schema change with a large data backfill in the same transaction.
Add columns as nullable first, then backfill, then add the NOT NULL constraint separately.
Create indexes concurrently on large tables (PostgreSQL: CONCURRENTLY, MySQL: use online DDL or gh-ost).

Output: migration.sql — the up migration file.

4. Write the Down Migration (Rollback)

Every up migration MUST have a corresponding down migration. This is non-negotiable.

The rollback must undo the up migration completely.
If the up migration added a column, the down drops it.
If the up migration renamed a column, the down renames it back.
If the up migration created an index, the down drops it.
Data loss warning: If the rollback would lose data (e.g., dropping a column that was populated), document this explicitly.

Output: rollback.sql — the down migration file.

5. Add Data Backup Commands

Before any destructive or risky migration, provide backup commands:

-- Backup: full table snapshot
CREATE TABLE orders_backup_20260306 AS SELECT * FROM orders;

-- Backup: affected column only (for large tables)
CREATE TABLE orders_email_backup_20260306 AS
  SELECT id, email FROM orders;

-- Verify backup row count matches
SELECT
  (SELECT count(*) FROM orders) AS original,
  (SELECT count(*) FROM orders_backup_20260306) AS backup;

For very large tables, suggest pg_dump with table filtering instead of CREATE TABLE AS.

Output: Backup commands included in the runbook.

6. Write Verification Queries

After the migration runs, how do you know it worked? Write queries that verify:

Schema is correct: Column exists, has right type, right default, right constraints.
Data is intact: Row counts match, no unexpected NULLs, values in expected range.
Application works: Key queries still return expected results.

-- Verify column exists with correct type
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'orders' AND column_name = 'email';

-- Verify row count unchanged
SELECT count(*) FROM orders;

-- Verify no unexpected NULLs (after backfill)
SELECT count(*) FROM orders WHERE email IS NULL;

Output: Verification queries included in the runbook.

7. Create Deployment Runbook

Compile everything into a runbook that an on-call engineer can follow:

## Migration Runbook: [description]

### Pre-flight

- [ ] Backup commands executed and verified
- [ ] Lock timeout set
- [ ] Off-peak deployment window confirmed
- [ ] Application code deployed (if dual-write needed)

### Execute

1. Run migration.sql
2. Verify with verification queries
3. Monitor error rates for 15 minutes

### Rollback (if needed)

1. Run rollback.sql
2. Verify rollback with verification queries
3. Restore from backup if data was lost

### Post-migration

- [ ] Remove old column/compatibility code (after soak period)
- [ ] Update documentation
- [ ] Close migration ticket

Output: runbook.md — the deployment runbook.

Safety Checklist

Before considering any migration complete, verify:

Up migration file written and tested
Down migration (rollback) file written and tested
Rollback reverses the up migration completely
Data backup commands included
Lock impact analyzed and documented
Zero-downtime strategy confirmed (or downtime window documented)
Verification queries written for post-migration checks
Deployment runbook created with pre-flight, execute, rollback sections
Large table operations use CONCURRENTLY or equivalent

Common Migration Patterns

Adding a non-nullable column with default (large table)

Unsafe — blocks reads and writes while rewriting the entire table:

ALTER TABLE orders ADD COLUMN status varchar(50) NOT NULL DEFAULT 'pending';

Safe — three-step approach:

-- Step 1: Add nullable column (instant, no rewrite)
ALTER TABLE orders ADD COLUMN status varchar(50);

-- Step 2: Backfill in batches (no lock held between batches)
UPDATE orders SET status = 'pending' WHERE status IS NULL AND id BETWEEN 1 AND 100000;
UPDATE orders SET status = 'pending' WHERE status IS NULL AND id BETWEEN 100001 AND 200000;
-- ... continue in batches

-- Step 3: Add NOT NULL constraint (after backfill complete)
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders ALTER COLUMN status SET DEFAULT 'pending';

Renaming a column

Unsafe — breaks all application code instantly:

ALTER TABLE users RENAME COLUMN email TO email_address;

Safe — dual-write strategy:

-- Step 1: Add new column
ALTER TABLE users ADD COLUMN email_address varchar(255);

-- Step 2: Backfill
UPDATE users SET email_address = email WHERE email_address IS NULL;

-- Step 3: Deploy app code that writes to BOTH columns
-- Step 4: Switch reads to new column
-- Step 5: Stop writing to old column
-- Step 6: Drop old column (in a later migration)
ALTER TABLE users DROP COLUMN email;

Adding an index on a large table

Unsafe — blocks all writes for the duration:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Safe — non-blocking:

CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

Note: CONCURRENTLY cannot run inside a transaction. It takes longer but doesn't block writes. For a 100M row table, expect 10-30 minutes depending on hardware.

Common Mistakes

Mistake	Fix
No rollback migration	Every up migration MUST have a down migration. No exceptions.
No data backup commands	Include CREATE TABLE AS or pg_dump commands before destructive changes
Ignoring lock impact on large tables	Always check row count and classify lock level before writing SQL
Using CREATE INDEX without CONCURRENTLY	On tables > 1M rows, always use CONCURRENTLY (PostgreSQL) or online DDL
Adding NOT NULL column with default in one step	Split into: add nullable → backfill in batches → add constraint
Bare RENAME COLUMN in production	Use dual-write strategy: add new → backfill → migrate reads → drop old
No verification queries	Write queries that confirm schema correctness and data integrity post-migration
Missing deployment runbook	Create a step-by-step runbook with pre-flight, execute, rollback, and post-migration sections
Backfilling millions of rows in one UPDATE	Batch updates (10k-100k rows per batch) to avoid long-running transactions
Running migrations during peak traffic	Schedule for off-peak hours; set lock_timeout to fail fast if locks can't be acquired

Key Principles

Every migration must be reversible — If you can't write a rollback that undoes the change, the migration is too risky. Restructure it into smaller, reversible steps. The only exception is dropping a column that was already emptied — and even then, document the data loss.
Always include verification queries — "It ran without errors" is not verification. Write queries that prove the schema is correct, data is intact, and the application still works. Run these after every migration AND after every rollback.
Respect table size — A migration that's instant on a 1,000-row dev table can lock production for 30 minutes on a 100M-row table. Always ask about row counts and adjust strategy accordingly. When in doubt, assume the table is large.
Separate schema changes from data changes — Never combine ALTER TABLE with a large UPDATE in the same transaction. Schema changes should be instant; data backfills should be batched. This prevents long-held locks and makes rollback simpler.
Fail fast, don't block — Set lock_timeout so migrations fail quickly if they can't acquire locks, rather than blocking all application queries while waiting. A failed migration you can retry is better than a blocked application.

database-migration