SQL Optimization

Provide tactical guidance for optimizing SQL query performance across PostgreSQL, MySQL, and SQL Server through execution plan analysis, strategic indexing, and query rewriting.

When to Use This Skill

Trigger this skill when encountering:

Slow query performance or database timeouts
Analyzing EXPLAIN plans or execution plans
Determining index requirements
Rewriting inefficient queries
Identifying query anti-patterns (N+1, SELECT *, correlated subqueries)
Database-specific optimization needs (PostgreSQL, MySQL, SQL Server)

Core Optimization Workflow

Step 1: Analyze Query Performance

Run execution plan analysis to identify bottlenecks:

PostgreSQL:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

MySQL:

EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5;

SQL Server: Use SQL Server Management Studio: Display Estimated Execution Plan (Ctrl+L)

Key Metrics to Monitor:

Cost: Estimated resource consumption
Rows: Number of rows processed (estimated vs actual)
Scan Type: Sequential scan vs index scan
Execution Time: Actual time spent on operation

For detailed execution plan interpretation, see references/explain-guide.md.

Step 2: Identify Optimization Opportunities

Common Red Flags:

Indicator	Problem	Solution
Seq Scan / Table Scan	Full table scan on large table	Add index on filter columns
High row count	Processing excessive rows	Add WHERE filter or index
Nested Loop with large outer table	Inefficient join algorithm	Index join columns
Correlated subquery	Subquery executes per row	Rewrite as JOIN or EXISTS
Sort operation on large result set	Expensive sorting	Add index matching ORDER BY

For scan type interpretation, see references/scan-types.md.

Step 3: Apply Indexing Strategies

Index Decision Framework:

Is column used in WHERE, JOIN, ORDER BY, or GROUP BY?
├─ YES → Is column selective (many unique values)?
│  ├─ YES → Is table frequently queried?
│  │  ├─ YES → ADD INDEX
│  │  └─ NO → Consider based on query frequency
│  └─ NO (low selectivity) → Skip index
└─ NO → Skip index

Index Types by Use Case:

PostgreSQL:

B-tree (default): General-purpose, supports <, ≤, =, ≥, >, BETWEEN, IN
Hash: Equality comparisons only (=)
GIN: Full-text search, JSONB, arrays
GiST: Spatial data, geometric types
BRIN: Very large tables with naturally ordered data

MySQL:

B-tree (default): General-purpose index
Full-text: Text search on VARCHAR/TEXT columns
Spatial: Spatial data types

SQL Server:

Clustered: Table data sorted by index (one per table)
Non-clustered: Separate index structure (multiple allowed)

For comprehensive indexing guidance, see references/indexing-decisions.md and references/index-types.md.

Step 4: Design Composite Indexes

For queries filtering on multiple columns, use composite indexes:

Column Order Matters:

Equality filters first (most selective)
Additional equality filters (by selectivity)
Range filters or ORDER BY (last)

Example:

-- Query pattern
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

-- Optimal composite index
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

For composite index design patterns, see references/composite-indexes.md.

Step 5: Rewrite Inefficient Queries

Common Anti-Patterns to Avoid:

1. SELECT * (Over-fetching)

-- ❌ Bad: Fetches all columns
SELECT * FROM users WHERE id = 1;

-- ✅ Good: Fetch only needed columns
SELECT id, name, email FROM users WHERE id = 1;

2. N+1 Queries

-- ❌ Bad: 1 + N queries
SELECT * FROM users LIMIT 100;
-- Then in loop: SELECT * FROM posts WHERE user_id = ?;

-- ✅ Good: Single JOIN
SELECT users.*, posts.id AS post_id, posts.title
FROM users
LEFT JOIN posts ON users.id = posts.user_id;

3. Non-Sargable Queries (functions on indexed columns)

-- ❌ Bad: Function prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- ✅ Good: Sargable range condition
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';

4. Correlated Subqueries

-- ❌ Bad: Subquery executes per row
SELECT name,
  (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id)
FROM users;

-- ✅ Good: JOIN with GROUP BY
SELECT users.name, COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;

For complete anti-pattern reference, see references/anti-patterns.md. For efficient query patterns, see references/efficient-patterns.md.

Quick Reference Tables

Index Selection Guide

Query Pattern	Index Type	Example
`WHERE column = value`	Single-column B-tree	`CREATE INDEX ON table (column)`
`WHERE col1 = ? AND col2 = ?`	Composite B-tree	`CREATE INDEX ON table (col1, col2)`
`WHERE text_col LIKE '%word%'`	Full-text (GIN/Full-text)	`CREATE INDEX ON table USING GIN (to_tsvector('english', text_col))`
`WHERE geom && box`	Spatial (GiST)	`CREATE INDEX ON table USING GIST (geom)`
`WHERE json_col @> '{"key":"value"}'`	JSONB (GIN)	`CREATE INDEX ON table USING GIN (json_col)`

Join Optimization Checklist

Index foreign key columns on both sides of JOIN
Order joins starting with table returning fewest rows
Use INNER JOIN when possible (more efficient than OUTER JOIN)
Avoid joining more than 5 tables (break into CTEs or subqueries)
Consider denormalization for frequently joined tables in read-heavy systems

Execution Plan Performance Targets

Scan Type	Performance	When Acceptable
Index-Only Scan	Best	Always preferred
Index Scan	Excellent	Small-medium result sets
Bitmap Heap Scan	Good	Medium result sets (PostgreSQL)
Sequential Scan	Poor	Only for small tables (<1000 rows) or full table queries
Table Scan	Poor	Only for small tables or unavoidable full scans

Database-Specific Optimizations

PostgreSQL-Specific Features

Partial Indexes (index subset of rows):

CREATE INDEX idx_active_users_login
ON users (last_login)
WHERE status = 'active';

Expression Indexes (index computed values):

CREATE INDEX idx_users_email_lower
ON users (LOWER(email));

Covering Indexes (avoid heap access):

CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name);

For comprehensive PostgreSQL optimization, see references/postgresql.md.

MySQL-Specific Features

Index Hints (override optimizer):

SELECT * FROM orders USE INDEX (idx_orders_customer)
WHERE customer_id = 123;

Storage Engine Selection:

InnoDB (default): Transactional, row-level locks, clustered primary key
MyISAM: Faster reads, no transactions, table-level locks

For comprehensive MySQL optimization, see references/mysql.md.

SQL Server-Specific Features

Query Store (track query performance over time):

ALTER DATABASE YourDatabase SET QUERY_STORE = ON;

Execution Plan Warnings:

Look for yellow exclamation marks in graphical execution plans
Thick arrows indicate high row counts

For comprehensive SQL Server optimization, see references/sqlserver.md.

Advanced Optimization Techniques

Common Table Expressions (CTEs)

Break complex queries into readable, maintainable parts:

WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, COUNT(*) as order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.order_count, 0) as orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ac.id = ro.customer_id;

EXISTS vs IN for Subqueries

Use EXISTS for better performance with large datasets:

-- ✅ Good: EXISTS stops at first match
SELECT * FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);

-- ❌ Less efficient: IN builds full list
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

Denormalization Decision Framework

Consider denormalization when:

Query joins 3+ tables frequently
Data is relatively static (infrequent updates)
Read performance is critical
Write overhead is acceptable

Denormalization Strategies:

Duplicate columns: Copy foreign key data into main table
Summary tables: Pre-aggregate data
Materialized views: Database-maintained denormalized views
Application caching: Redis/Memcached for frequently accessed data

Optimization Workflow Example

Scenario: API endpoint taking 2 seconds to load

Step 1: Identify Slow Query

Use APM/observability tools to identify database query causing delay

Step 2: Run EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

Step 3: Analyze Output

Seq Scan on orders (cost=0.00..2500.00 rows=10)
  Filter: (customer_id = 123)
  Rows Removed by Filter: 99990

Problem: Sequential scan filtering 99,990 rows

Step 4: Add Composite Index

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

Step 5: Verify Improvement

EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

Index Scan using idx_orders_customer_created (cost=0.42..12.44 rows=10)
  Index Cond: (customer_id = 123)

Result: 200x faster (2000ms → 10ms)

Monitoring and Maintenance

Regular Optimization Tasks:

Review slow query logs weekly
Update database statistics regularly (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server)
Monitor index usage (drop unused indexes)
Archive old data to keep tables manageable
Review execution plans for critical queries quarterly

PostgreSQL Statistics Update:

ANALYZE table_name;

MySQL Statistics Update:

ANALYZE TABLE table_name;

SQL Server Statistics Update:

UPDATE STATISTICS table_name;

Related Skills

databases-relational: Schema design and database fundamentals
observability: Performance monitoring and slow query detection
api-patterns: API-level optimization (pagination, caching)
performance-engineering: Application performance profiling

Additional Resources

For comprehensive documentation, reference these files:

references/explain-guide.md - Detailed EXPLAIN plan interpretation
references/scan-types.md - Scan type meanings and performance implications
references/indexing-decisions.md - When and how to add indexes
references/index-types.md - Database-specific index types
references/composite-indexes.md - Multi-column index design
references/anti-patterns.md - Common anti-patterns with solutions
references/efficient-patterns.md - Efficient query patterns
references/postgresql.md - PostgreSQL-specific optimizations
references/mysql.md - MySQL-specific optimizations
references/sqlserver.md - SQL Server-specific optimizations

For working SQL examples, see examples/ directory.

optimizing-sql