SQL Guide

Applies to: PostgreSQL 15+, MySQL 8+, Database Migrations, Query Optimization

Core Principles

Parameterized Queries Always: Never concatenate user input into SQL strings -- use bind parameters ($1, ?, :name) without exception
Explicit Over Implicit: Name all constraints, specify column lists in INSERT, avoid SELECT * in production code
Migrations Are Immutable: Once applied to a shared environment, never modify a migration -- create a new one
Indexes Are Not Free: Every index speeds reads but slows writes; justify each index with a query plan
Transactions Are Boundaries: Keep transactions short, choose the correct isolation level, and always handle rollback

Guardrails

Naming Conventions

Tables: snake_case, plural (users, order_items, audit_logs)
Columns: snake_case, singular (email, created_at, is_active)
Primary keys: id (integer or UUID depending on project convention)
Foreign keys: <singular_table>_id (user_id, order_id)
Indexes: idx_<table>_<columns> (idx_users_email, idx_orders_user_id_created_at)
Unique constraints: uq_<table>_<columns> (uq_users_email)
Check constraints: ck_<table>_<description> (ck_orders_positive_total)
Boolean columns: is_ or has_ prefix (is_active, has_verified_email)
Timestamps: created_at, updated_at, deleted_at (always TIMESTAMPTZ)

Query Safety

ALWAYS use parameterized queries -- no string interpolation of user input
ALWAYS specify column lists in INSERT statements
NEVER use SELECT * in application code (acceptable in ad-hoc queries only)
ALWAYS add LIMIT to queries that return lists (prevent unbounded result sets)
NEVER use TRUNCATE or DROP in application code without explicit safeguards
ALWAYS use EXISTS instead of COUNT(*) > 0 for existence checks
ALWAYS qualify column names with table aliases in JOINs

-- GOOD: parameterized, explicit columns, bounded
SELECT u.id, u.email, u.created_at
FROM users u
WHERE u.email = $1
  AND u.is_active = true
LIMIT 1;

-- BAD: string concatenation, SELECT *, no LIMIT
SELECT * FROM users WHERE email = '" + userInput + "';

Indexing

Create indexes to support WHERE, JOIN, and ORDER BY clauses
Composite index column order: equality columns first, then range, then sort
Use UNIQUE indexes to enforce business rules at the database level
Prefer partial indexes when filtering on a known subset (WHERE is_active = true)
Use CONCURRENTLY for index creation on live tables (PostgreSQL)
Review index usage periodically -- drop unused indexes

-- Composite index: equality (status) first, then range (created_at)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC);

-- Partial index: only index active users
CREATE INDEX idx_users_email_active
ON users (email) WHERE is_active = true;

Transactions

Keep transactions as short as possible (no I/O or external calls inside)
Use READ COMMITTED for most operations (PostgreSQL default)
Use REPEATABLE READ when a transaction reads the same data multiple times
Use SERIALIZABLE for financial or inventory operations requiring strict consistency
Always explicitly COMMIT or ROLLBACK -- never leave transactions hanging
Use SAVEPOINT for partial rollback within complex transactions

BEGIN;
  SAVEPOINT before_update;

  UPDATE accounts SET balance = balance - 100.00 WHERE id = $1;
  UPDATE accounts SET balance = balance + 100.00 WHERE id = $2;

  -- If second update fails, roll back to savepoint
  -- ROLLBACK TO SAVEPOINT before_update;
COMMIT;

Migrations

Every migration MUST have both up and down functions
Migrations MUST be idempotent (IF NOT EXISTS, IF EXISTS guards)
Never rename or drop columns in a single step -- use a multi-step process
Add columns as NULL first, backfill, then add NOT NULL constraint
Never modify a migration that has been applied to a shared environment
Migration filenames: YYYYMMDDHHMMSS_description.sql or sequential numbering
Test both up and down in development before committing

-- UP: 20250115120000_add_users_phone.sql
ALTER TABLE users ADD COLUMN IF NOT EXISTS phone VARCHAR(20);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone
ON users (phone) WHERE phone IS NOT NULL;

-- DOWN: 20250115120000_add_users_phone.sql
DROP INDEX CONCURRENTLY IF EXISTS idx_users_phone;
ALTER TABLE users DROP COLUMN IF EXISTS phone;

Schema Design

Standard Table Template

CREATE TABLE IF NOT EXISTS orders (
    id            BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id       BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    status        VARCHAR(20) NOT NULL DEFAULT 'pending'
                    CONSTRAINT ck_orders_valid_status
                    CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    total_cents   INTEGER NOT NULL
                    CONSTRAINT ck_orders_positive_total CHECK (total_cents >= 0),
    notes         TEXT,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_orders_updated_at
    BEFORE UPDATE ON orders
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();

Design Rules

Use BIGINT GENERATED ALWAYS AS IDENTITY for auto-increment PKs (PostgreSQL 10+)
Use UUID PKs when IDs are exposed externally or distributed systems require it
Use TIMESTAMPTZ (not TIMESTAMP) for all time columns -- always store UTC
Add NOT NULL constraints by default; allow NULL only when semantically meaningful
Use CHECK constraints to enforce domain rules at the database level
Use ENUM types sparingly -- prefer VARCHAR with CHECK for easier migration
Add ON DELETE CASCADE or ON DELETE SET NULL explicitly to every foreign key
Use soft deletes (deleted_at TIMESTAMPTZ) when audit trail is required

Key Patterns

Common Table Expressions (CTEs)

Prefer CTEs over subqueries for readability and maintainability.

-- Readable: each step has a name
WITH active_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent
    FROM orders
    WHERE status != 'cancelled'
      AND created_at >= now() - INTERVAL '30 days'
    GROUP BY user_id
),
high_value_users AS (
    SELECT user_id
    FROM active_orders
    WHERE total_spent > 50000  -- over $500
)
SELECT u.id, u.email, ao.order_count, ao.total_spent
FROM users u
JOIN active_orders ao ON ao.user_id = u.id
WHERE u.id IN (SELECT user_id FROM high_value_users)
ORDER BY ao.total_spent DESC;

Window Functions

Use window functions for ranking, running totals, and row comparisons without self-joins.

-- Rank users by spending within each region
SELECT
    u.id,
    u.region,
    SUM(o.total_cents) AS total_spent,
    RANK() OVER (PARTITION BY u.region ORDER BY SUM(o.total_cents) DESC) AS rank
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.region;

-- Running total of daily revenue
SELECT
    date_trunc('day', created_at) AS day,
    SUM(total_cents) AS daily_revenue,
    SUM(SUM(total_cents)) OVER (ORDER BY date_trunc('day', created_at)) AS running_total
FROM orders
WHERE status = 'delivered'
GROUP BY date_trunc('day', created_at)
ORDER BY day;

UPSERT (INSERT ... ON CONFLICT)

-- PostgreSQL: insert or update on conflict
INSERT INTO user_preferences (user_id, theme, language)
VALUES ($1, $2, $3)
ON CONFLICT (user_id)
DO UPDATE SET
    theme = EXCLUDED.theme,
    language = EXCLUDED.language,
    updated_at = now();

Batch Operations

-- Batch insert with unnest (PostgreSQL)
INSERT INTO tags (name, category)
SELECT unnest($1::text[]), unnest($2::text[])
ON CONFLICT (name) DO NOTHING;

-- Batch update with VALUES list
UPDATE products AS p
SET price_cents = v.new_price
FROM (VALUES
    (1, 2999),
    (2, 4999),
    (3, 1499)
) AS v(id, new_price)
WHERE p.id = v.id;

Parameterized Queries (Application Code)

# Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries
cursor.execute(
    "SELECT id, email FROM users WHERE email = %s AND is_active = %s",
    (email, True),
)

# Node.js (pg)
const result = await pool.query(
    'SELECT id, email FROM users WHERE email = $1 AND is_active = $2',
    [email, true]
);

# Go (database/sql)
row := db.QueryRowContext(ctx,
    "SELECT id, email FROM users WHERE email = $1 AND is_active = $2",
    email, true,
)

Performance

EXPLAIN ANALYZE

Always use EXPLAIN ANALYZE to validate query plans before deploying.

-- Check execution plan and actual timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at >= '2025-01-01'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 20;

What to look for:

Seq Scan on large tables -- usually needs an index
Nested Loop with large outer sets -- consider Hash Join via index or rewrite
Sort operations with high cost -- add index matching ORDER BY
Rows Removed by Filter much larger than Actual Rows -- index the filter column
Buffers: shared hit vs shared read -- high reads indicate cold cache

Index Strategy

Access Pattern	Index Type	Example
Equality lookup	B-tree (default)	`WHERE email = $1`
Range scan	B-tree	`WHERE created_at > $1`
Full-text search	GIN + tsvector	`WHERE search_vector @@ to_tsquery($1)`
JSONB field access	GIN	`WHERE metadata @> '{"key": "val"}'`
Array containment	GIN	`WHERE tags @> ARRAY['sql']`
Geometric/spatial	GiST	`WHERE location <-> point($1,$2) < $3`
Pattern matching (`LIKE 'abc%'`)	B-tree with `text_pattern_ops`	`WHERE name LIKE 'abc%'`

N+1 Prevention

-- BAD: N+1 (one query per user to fetch orders)
-- Application loops: for each user, SELECT orders WHERE user_id = ?

-- GOOD: single query with JOIN
SELECT u.id, u.email, o.id AS order_id, o.total_cents
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.is_active = true
ORDER BY u.id, o.created_at DESC;

-- GOOD: batch with ANY (when you have a list of IDs)
SELECT id, email, created_at
FROM users
WHERE id = ANY($1::bigint[]);

Pagination

-- Cursor-based pagination (preferred for large datasets)
SELECT id, email, created_at
FROM users
WHERE created_at < $1  -- cursor: last seen created_at
  AND is_active = true
ORDER BY created_at DESC
LIMIT 25;

-- Offset-based pagination (simpler but slower for deep pages)
SELECT id, email, created_at
FROM users
WHERE is_active = true
ORDER BY created_at DESC
LIMIT 25 OFFSET $1;

Tooling

Essential Commands

# PostgreSQL CLI
psql -h localhost -U myuser -d mydb       # Connect
psql -f migration.sql mydb                # Run migration file
pg_dump -Fc mydb > backup.dump            # Backup (custom format)
pg_restore -d mydb backup.dump            # Restore

# pgcli (enhanced CLI with autocomplete)
pgcli -h localhost -U myuser -d mydb

# Schema inspection
\dt                   # List tables
\d+ table_name        # Describe table with details
\di                   # List indexes
\df                   # List functions

Migration Tools

# golang-migrate
migrate create -ext sql -dir db/migrations -seq add_users_phone
migrate -path db/migrations -database "$DATABASE_URL" up
migrate -path db/migrations -database "$DATABASE_URL" down 1

# Alembic (Python/SQLAlchemy)
alembic revision -m "add users phone"
alembic upgrade head
alembic downgrade -1

# Prisma (Node.js)
npx prisma migrate dev --name add_users_phone
npx prisma migrate deploy

# dbmate
dbmate new add_users_phone
dbmate up
dbmate rollback

Useful Diagnostic Queries (PostgreSQL)

-- Find slow queries (requires pg_stat_statements extension)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Active connections and queries
SELECT pid, state, query, query_start, now() - query_start AS duration
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

References

For detailed patterns and examples, see:

references/patterns.md -- Window functions, CTE patterns, indexing strategies, advanced query techniques

External References

PostgreSQL Documentation
Use The Index, Luke -- SQL indexing and tuning
Modern SQL -- Modern SQL features across databases
pgMustard EXPLAIN Guide -- Reading query plans
SQL Style Guide (Holywell) -- Formatting conventions
Postgres Wiki: Don't Do This -- Common PostgreSQL anti-patterns

sql-guide