sql-optimizer
SQL Optimizer
Overview
Analyze SQL queries for performance problems and produce optimized versions with appropriate indexes. Covers query rewriting, index recommendations, execution plan analysis, and common anti-patterns.
Instructions
When a user asks you to optimize a SQL query or fix slow database performance, follow these steps:
Step 1: Get the query and context
Determine:
- The full SQL query to optimize
- Database engine (PostgreSQL, MySQL, SQLite)
- Table sizes (approximate row counts)
- Existing indexes
- Current execution time
If you have access to the database, gather this yourself:
-- PostgreSQL: Check table sizes
SELECT relname AS table_name, reltuples::bigint AS row_count
FROM pg_class
WHERE relkind = 'r' AND relnamespace = (
SELECT oid FROM pg_namespace WHERE nspname = 'public'
)
ORDER BY reltuples DESC;
-- PostgreSQL: Check existing indexes
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'target_table';
Step 2: Analyze the execution plan
-- PostgreSQL
EXPLAIN ANALYZE
SELECT ...your query here...;
-- MySQL
EXPLAIN FORMAT=JSON
SELECT ...your query here...;
Look for these red flags in the plan:
- Seq Scan on large tables (missing index)
- Nested Loop with large row counts (N+1 pattern)
- Sort with high cost (missing index for ORDER BY)
- Hash Join when a smaller join would suffice
- Rows removed by filter much larger than rows returned (bad selectivity)
Step 3: Apply optimizations
Anti-pattern: SELECT * when you need specific columns
-- Before (fetches all columns, prevents covering index use)
SELECT * FROM orders WHERE status = 'pending';
-- After (fetch only needed columns)
SELECT id, customer_id, total, created_at
FROM orders WHERE status = 'pending';
Anti-pattern: Missing index on WHERE/JOIN columns
-- If this query is slow:
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- Add a composite index:
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
Anti-pattern: N+1 queries in application code
-- Before: 1 query + N queries
SELECT id FROM orders WHERE date > '2024-01-01';
-- Then for each order:
SELECT * FROM order_items WHERE order_id = ?;
-- After: Single query with JOIN
SELECT o.id, o.total, oi.product_name, oi.quantity, oi.price
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.date > '2024-01-01';
Anti-pattern: Functions on indexed columns
-- Before (cannot use index on created_at)
SELECT * FROM users WHERE YEAR(created_at) = 2024;
-- After (uses index on created_at)
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
Anti-pattern: Correlated subquery that can be a JOIN
-- Before (executes subquery for every row)
SELECT name, (
SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.id
) AS order_count
FROM customers;
-- After (single pass with JOIN)
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Anti-pattern: OFFSET for deep pagination
-- Before (scans and discards 10000 rows)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
-- After (keyset pagination, uses index)
SELECT * FROM products WHERE id > 10000 ORDER BY id LIMIT 20;
Step 4: Recommend indexes
Follow these rules for index design:
- Index columns used in WHERE, JOIN ON, and ORDER BY clauses
- Put equality conditions first, then range conditions in composite indexes
- Consider covering indexes that include SELECT columns to avoid table lookups
- Do not create indexes on low-cardinality columns (e.g., boolean flags) alone
- Remove unused or duplicate indexes
-- Composite index for: WHERE status = 'active' AND created_at > '2024-01-01' ORDER BY name
CREATE INDEX idx_users_status_created_name
ON users (status, created_at, name);
Step 5: Report the results
Present:
- The original query with identified problems
- The optimized query
- Index recommendations with CREATE INDEX statements
- Expected improvement (estimated from plan changes)
Examples
Example 1: Optimize a slow reporting query
User request: "This query takes 45 seconds on 2M rows, can you speed it up?"
Original query:
SELECT c.name, c.email,
COUNT(*) AS order_count,
SUM(o.total) AS total_spent
FROM customers c, orders o
WHERE c.id = o.customer_id
AND YEAR(o.created_at) = 2024
AND o.status IN ('completed', 'shipped')
GROUP BY c.name, c.email
HAVING SUM(o.total) > 500
ORDER BY total_spent DESC;
Problems identified:
- Implicit join syntax (old-style comma join)
YEAR()function prevents index use oncreated_at- No index on
(customer_id, status, created_at) - GROUP BY on non-indexed columns
Optimized query:
SELECT c.name, c.email,
COUNT(*) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.created_at >= '2024-01-01'
AND o.created_at < '2025-01-01'
AND o.status IN ('completed', 'shipped')
GROUP BY c.id, c.name, c.email
HAVING SUM(o.total) > 500
ORDER BY total_spent DESC;
Index recommendations:
CREATE INDEX idx_orders_customer_status_date
ON orders (customer_id, status, created_at);
Expected result: ~45s down to ~0.5s with the index.
Example 2: Fix N+1 in an ORM
User request: "My Django view is slow, the debug toolbar shows 200+ queries"
Problem: ORM fetching related objects lazily in a loop.
Fix:
# Before: N+1 queries
orders = Order.objects.filter(status="pending")
for order in orders:
print(order.customer.name) # Each access = 1 query
# After: 2 queries total
orders = Order.objects.filter(status="pending").select_related("customer")
for order in orders:
print(order.customer.name) # Already loaded
Guidelines
- Always check the execution plan before and after optimization. Do not guess at performance.
- Index recommendations must consider write overhead. Tables with heavy INSERT/UPDATE may not benefit from many indexes.
- Composite index column order matters. Put equality columns first, then range columns, then sort columns.
- For PostgreSQL, use
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)for the most useful plan output. - Do not blindly add indexes to every column in a WHERE clause. Analyze query patterns first.
- When rewriting queries, verify that the optimized version returns the same results as the original.
- For MySQL, be aware of the optimizer's single-index-per-table limitation in older versions. Use composite indexes.
- If a query cannot be optimized further, suggest materialized views or caching as alternatives.
- Always present the original and optimized queries side by side for easy comparison.