sql-guide
SKILL.md
SQL Guide
Applies to: PostgreSQL 15+, MySQL 8+, Database Migrations, Query Optimization
Core Principles
- Parameterized Queries Always: Never concatenate user input into SQL strings -- use bind parameters (
$1,?,:name) without exception - Explicit Over Implicit: Name all constraints, specify column lists in INSERT, avoid
SELECT *in production code - Migrations Are Immutable: Once applied to a shared environment, never modify a migration -- create a new one
- Indexes Are Not Free: Every index speeds reads but slows writes; justify each index with a query plan
- Transactions Are Boundaries: Keep transactions short, choose the correct isolation level, and always handle rollback
Guardrails
Naming Conventions
- Tables:
snake_case, plural (users,order_items,audit_logs) - Columns:
snake_case, singular (email,created_at,is_active) - Primary keys:
id(integer or UUID depending on project convention) - Foreign keys:
<singular_table>_id(user_id,order_id) - Indexes:
idx_<table>_<columns>(idx_users_email,idx_orders_user_id_created_at) - Unique constraints:
uq_<table>_<columns>(uq_users_email) - Check constraints:
ck_<table>_<description>(ck_orders_positive_total) - Boolean columns:
is_orhas_prefix (is_active,has_verified_email) - Timestamps:
created_at,updated_at,deleted_at(alwaysTIMESTAMPTZ)
Query Safety
- ALWAYS use parameterized queries -- no string interpolation of user input
- ALWAYS specify column lists in
INSERTstatements - NEVER use
SELECT *in application code (acceptable in ad-hoc queries only) - ALWAYS add
LIMITto queries that return lists (prevent unbounded result sets) - NEVER use
TRUNCATEorDROPin application code without explicit safeguards - ALWAYS use
EXISTSinstead ofCOUNT(*) > 0for existence checks - ALWAYS qualify column names with table aliases in JOINs
-- GOOD: parameterized, explicit columns, bounded
SELECT u.id, u.email, u.created_at
FROM users u
WHERE u.email = $1
AND u.is_active = true
LIMIT 1;
-- BAD: string concatenation, SELECT *, no LIMIT
SELECT * FROM users WHERE email = '" + userInput + "';
Indexing
- Create indexes to support
WHERE,JOIN, andORDER BYclauses - Composite index column order: equality columns first, then range, then sort
- Use
UNIQUEindexes to enforce business rules at the database level - Prefer partial indexes when filtering on a known subset (
WHERE is_active = true) - Use
CONCURRENTLYfor index creation on live tables (PostgreSQL) - Review index usage periodically -- drop unused indexes
-- Composite index: equality (status) first, then range (created_at)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC);
-- Partial index: only index active users
CREATE INDEX idx_users_email_active
ON users (email) WHERE is_active = true;
Transactions
- Keep transactions as short as possible (no I/O or external calls inside)
- Use
READ COMMITTEDfor most operations (PostgreSQL default) - Use
REPEATABLE READwhen a transaction reads the same data multiple times - Use
SERIALIZABLEfor financial or inventory operations requiring strict consistency - Always explicitly
COMMITorROLLBACK-- never leave transactions hanging - Use
SAVEPOINTfor partial rollback within complex transactions
BEGIN;
SAVEPOINT before_update;
UPDATE accounts SET balance = balance - 100.00 WHERE id = $1;
UPDATE accounts SET balance = balance + 100.00 WHERE id = $2;
-- If second update fails, roll back to savepoint
-- ROLLBACK TO SAVEPOINT before_update;
COMMIT;
Migrations
- Every migration MUST have both
upanddownfunctions - Migrations MUST be idempotent (
IF NOT EXISTS,IF EXISTSguards) - Never rename or drop columns in a single step -- use a multi-step process
- Add columns as
NULLfirst, backfill, then addNOT NULLconstraint - Never modify a migration that has been applied to a shared environment
- Migration filenames:
YYYYMMDDHHMMSS_description.sqlor sequential numbering - Test both
upanddownin development before committing
-- UP: 20250115120000_add_users_phone.sql
ALTER TABLE users ADD COLUMN IF NOT EXISTS phone VARCHAR(20);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone
ON users (phone) WHERE phone IS NOT NULL;
-- DOWN: 20250115120000_add_users_phone.sql
DROP INDEX CONCURRENTLY IF EXISTS idx_users_phone;
ALTER TABLE users DROP COLUMN IF EXISTS phone;
Schema Design
Standard Table Template
CREATE TABLE IF NOT EXISTS orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CONSTRAINT ck_orders_valid_status
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
total_cents INTEGER NOT NULL
CONSTRAINT ck_orders_positive_total CHECK (total_cents >= 0),
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_orders_updated_at
BEFORE UPDATE ON orders
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
Design Rules
- Use
BIGINT GENERATED ALWAYS AS IDENTITYfor auto-increment PKs (PostgreSQL 10+) - Use
UUIDPKs when IDs are exposed externally or distributed systems require it - Use
TIMESTAMPTZ(notTIMESTAMP) for all time columns -- always store UTC - Add
NOT NULLconstraints by default; allowNULLonly when semantically meaningful - Use
CHECKconstraints to enforce domain rules at the database level - Use
ENUMtypes sparingly -- preferVARCHARwithCHECKfor easier migration - Add
ON DELETE CASCADEorON DELETE SET NULLexplicitly to every foreign key - Use soft deletes (
deleted_at TIMESTAMPTZ) when audit trail is required
Key Patterns
Common Table Expressions (CTEs)
Prefer CTEs over subqueries for readability and maintainability.
-- Readable: each step has a name
WITH active_orders AS (
SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent
FROM orders
WHERE status != 'cancelled'
AND created_at >= now() - INTERVAL '30 days'
GROUP BY user_id
),
high_value_users AS (
SELECT user_id
FROM active_orders
WHERE total_spent > 50000 -- over $500
)
SELECT u.id, u.email, ao.order_count, ao.total_spent
FROM users u
JOIN active_orders ao ON ao.user_id = u.id
WHERE u.id IN (SELECT user_id FROM high_value_users)
ORDER BY ao.total_spent DESC;
Window Functions
Use window functions for ranking, running totals, and row comparisons without self-joins.
-- Rank users by spending within each region
SELECT
u.id,
u.region,
SUM(o.total_cents) AS total_spent,
RANK() OVER (PARTITION BY u.region ORDER BY SUM(o.total_cents) DESC) AS rank
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.region;
-- Running total of daily revenue
SELECT
date_trunc('day', created_at) AS day,
SUM(total_cents) AS daily_revenue,
SUM(SUM(total_cents)) OVER (ORDER BY date_trunc('day', created_at)) AS running_total
FROM orders
WHERE status = 'delivered'
GROUP BY date_trunc('day', created_at)
ORDER BY day;
UPSERT (INSERT ... ON CONFLICT)
-- PostgreSQL: insert or update on conflict
INSERT INTO user_preferences (user_id, theme, language)
VALUES ($1, $2, $3)
ON CONFLICT (user_id)
DO UPDATE SET
theme = EXCLUDED.theme,
language = EXCLUDED.language,
updated_at = now();
Batch Operations
-- Batch insert with unnest (PostgreSQL)
INSERT INTO tags (name, category)
SELECT unnest($1::text[]), unnest($2::text[])
ON CONFLICT (name) DO NOTHING;
-- Batch update with VALUES list
UPDATE products AS p
SET price_cents = v.new_price
FROM (VALUES
(1, 2999),
(2, 4999),
(3, 1499)
) AS v(id, new_price)
WHERE p.id = v.id;
Parameterized Queries (Application Code)
# Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries
cursor.execute(
"SELECT id, email FROM users WHERE email = %s AND is_active = %s",
(email, True),
)
# Node.js (pg)
const result = await pool.query(
'SELECT id, email FROM users WHERE email = $1 AND is_active = $2',
[email, true]
);
# Go (database/sql)
row := db.QueryRowContext(ctx,
"SELECT id, email FROM users WHERE email = $1 AND is_active = $2",
email, true,
)
Performance
EXPLAIN ANALYZE
Always use EXPLAIN ANALYZE to validate query plans before deploying.
-- Check execution plan and actual timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at >= '2025-01-01'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 20;
What to look for:
Seq Scanon large tables -- usually needs an indexNested Loopwith large outer sets -- considerHash Joinvia index or rewriteSortoperations with high cost -- add index matchingORDER BYRows Removed by Filtermuch larger thanActual Rows-- index the filter columnBuffers: shared hitvsshared read-- high reads indicate cold cache
Index Strategy
| Access Pattern | Index Type | Example |
|---|---|---|
| Equality lookup | B-tree (default) | WHERE email = $1 |
| Range scan | B-tree | WHERE created_at > $1 |
| Full-text search | GIN + tsvector | WHERE search_vector @@ to_tsquery($1) |
| JSONB field access | GIN | WHERE metadata @> '{"key": "val"}' |
| Array containment | GIN | WHERE tags @> ARRAY['sql'] |
| Geometric/spatial | GiST | WHERE location <-> point($1,$2) < $3 |
Pattern matching (LIKE 'abc%') |
B-tree with text_pattern_ops |
WHERE name LIKE 'abc%' |
N+1 Prevention
-- BAD: N+1 (one query per user to fetch orders)
-- Application loops: for each user, SELECT orders WHERE user_id = ?
-- GOOD: single query with JOIN
SELECT u.id, u.email, o.id AS order_id, o.total_cents
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.is_active = true
ORDER BY u.id, o.created_at DESC;
-- GOOD: batch with ANY (when you have a list of IDs)
SELECT id, email, created_at
FROM users
WHERE id = ANY($1::bigint[]);
Pagination
-- Cursor-based pagination (preferred for large datasets)
SELECT id, email, created_at
FROM users
WHERE created_at < $1 -- cursor: last seen created_at
AND is_active = true
ORDER BY created_at DESC
LIMIT 25;
-- Offset-based pagination (simpler but slower for deep pages)
SELECT id, email, created_at
FROM users
WHERE is_active = true
ORDER BY created_at DESC
LIMIT 25 OFFSET $1;
Tooling
Essential Commands
# PostgreSQL CLI
psql -h localhost -U myuser -d mydb # Connect
psql -f migration.sql mydb # Run migration file
pg_dump -Fc mydb > backup.dump # Backup (custom format)
pg_restore -d mydb backup.dump # Restore
# pgcli (enhanced CLI with autocomplete)
pgcli -h localhost -U myuser -d mydb
# Schema inspection
\dt # List tables
\d+ table_name # Describe table with details
\di # List indexes
\df # List functions
Migration Tools
# golang-migrate
migrate create -ext sql -dir db/migrations -seq add_users_phone
migrate -path db/migrations -database "$DATABASE_URL" up
migrate -path db/migrations -database "$DATABASE_URL" down 1
# Alembic (Python/SQLAlchemy)
alembic revision -m "add users phone"
alembic upgrade head
alembic downgrade -1
# Prisma (Node.js)
npx prisma migrate dev --name add_users_phone
npx prisma migrate deploy
# dbmate
dbmate new add_users_phone
dbmate up
dbmate rollback
Useful Diagnostic Queries (PostgreSQL)
-- Find slow queries (requires pg_stat_statements extension)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
-- Active connections and queries
SELECT pid, state, query, query_start, now() - query_start AS duration
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
References
For detailed patterns and examples, see:
- references/patterns.md -- Window functions, CTE patterns, indexing strategies, advanced query techniques
External References
- PostgreSQL Documentation
- Use The Index, Luke -- SQL indexing and tuning
- Modern SQL -- Modern SQL features across databases
- pgMustard EXPLAIN Guide -- Reading query plans
- SQL Style Guide (Holywell) -- Formatting conventions
- Postgres Wiki: Don't Do This -- Common PostgreSQL anti-patterns
Weekly Installs
5
Repository
ar4mirez/samuelGitHub Stars
3
First Seen
Mar 1, 2026
Security Audits
Installed on
cline5
github-copilot5
codex5
kimi-cli5
gemini-cli5
cursor5