optimizing-sql

SKILL.md

SQL Optimization

Provide tactical guidance for optimizing SQL query performance across PostgreSQL, MySQL, and SQL Server through execution plan analysis, strategic indexing, and query rewriting.

When to Use This Skill

Trigger this skill when encountering:

  • Slow query performance or database timeouts
  • Analyzing EXPLAIN plans or execution plans
  • Determining index requirements
  • Rewriting inefficient queries
  • Identifying query anti-patterns (N+1, SELECT *, correlated subqueries)
  • Database-specific optimization needs (PostgreSQL, MySQL, SQL Server)

Core Optimization Workflow

Step 1: Analyze Query Performance

Run execution plan analysis to identify bottlenecks:

PostgreSQL:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

MySQL:

EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5;

SQL Server: Use SQL Server Management Studio: Display Estimated Execution Plan (Ctrl+L)

Key Metrics to Monitor:

  • Cost: Estimated resource consumption
  • Rows: Number of rows processed (estimated vs actual)
  • Scan Type: Sequential scan vs index scan
  • Execution Time: Actual time spent on operation

For detailed execution plan interpretation, see references/explain-guide.md.

Step 2: Identify Optimization Opportunities

Common Red Flags:

Indicator Problem Solution
Seq Scan / Table Scan Full table scan on large table Add index on filter columns
High row count Processing excessive rows Add WHERE filter or index
Nested Loop with large outer table Inefficient join algorithm Index join columns
Correlated subquery Subquery executes per row Rewrite as JOIN or EXISTS
Sort operation on large result set Expensive sorting Add index matching ORDER BY

For scan type interpretation, see references/scan-types.md.

Step 3: Apply Indexing Strategies

Index Decision Framework:

Is column used in WHERE, JOIN, ORDER BY, or GROUP BY?
├─ YES → Is column selective (many unique values)?
│  ├─ YES → Is table frequently queried?
│  │  ├─ YES → ADD INDEX
│  │  └─ NO → Consider based on query frequency
│  └─ NO (low selectivity) → Skip index
└─ NO → Skip index

Index Types by Use Case:

PostgreSQL:

  • B-tree (default): General-purpose, supports <, ≤, =, ≥, >, BETWEEN, IN
  • Hash: Equality comparisons only (=)
  • GIN: Full-text search, JSONB, arrays
  • GiST: Spatial data, geometric types
  • BRIN: Very large tables with naturally ordered data

MySQL:

  • B-tree (default): General-purpose index
  • Full-text: Text search on VARCHAR/TEXT columns
  • Spatial: Spatial data types

SQL Server:

  • Clustered: Table data sorted by index (one per table)
  • Non-clustered: Separate index structure (multiple allowed)

For comprehensive indexing guidance, see references/indexing-decisions.md and references/index-types.md.

Step 4: Design Composite Indexes

For queries filtering on multiple columns, use composite indexes:

Column Order Matters:

  1. Equality filters first (most selective)
  2. Additional equality filters (by selectivity)
  3. Range filters or ORDER BY (last)

Example:

-- Query pattern
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

-- Optimal composite index
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

For composite index design patterns, see references/composite-indexes.md.

Step 5: Rewrite Inefficient Queries

Common Anti-Patterns to Avoid:

1. SELECT * (Over-fetching)

-- ❌ Bad: Fetches all columns
SELECT * FROM users WHERE id = 1;

-- ✅ Good: Fetch only needed columns
SELECT id, name, email FROM users WHERE id = 1;

2. N+1 Queries

-- ❌ Bad: 1 + N queries
SELECT * FROM users LIMIT 100;
-- Then in loop: SELECT * FROM posts WHERE user_id = ?;

-- ✅ Good: Single JOIN
SELECT users.*, posts.id AS post_id, posts.title
FROM users
LEFT JOIN posts ON users.id = posts.user_id;

3. Non-Sargable Queries (functions on indexed columns)

-- ❌ Bad: Function prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- ✅ Good: Sargable range condition
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';

4. Correlated Subqueries

-- ❌ Bad: Subquery executes per row
SELECT name,
  (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id)
FROM users;

-- ✅ Good: JOIN with GROUP BY
SELECT users.name, COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;

For complete anti-pattern reference, see references/anti-patterns.md. For efficient query patterns, see references/efficient-patterns.md.

Quick Reference Tables

Index Selection Guide

Query Pattern Index Type Example
WHERE column = value Single-column B-tree CREATE INDEX ON table (column)
WHERE col1 = ? AND col2 = ? Composite B-tree CREATE INDEX ON table (col1, col2)
WHERE text_col LIKE '%word%' Full-text (GIN/Full-text) CREATE INDEX ON table USING GIN (to_tsvector('english', text_col))
WHERE geom && box Spatial (GiST) CREATE INDEX ON table USING GIST (geom)
WHERE json_col @> '{"key":"value"}' JSONB (GIN) CREATE INDEX ON table USING GIN (json_col)

Join Optimization Checklist

  • Index foreign key columns on both sides of JOIN
  • Order joins starting with table returning fewest rows
  • Use INNER JOIN when possible (more efficient than OUTER JOIN)
  • Avoid joining more than 5 tables (break into CTEs or subqueries)
  • Consider denormalization for frequently joined tables in read-heavy systems

Execution Plan Performance Targets

Scan Type Performance When Acceptable
Index-Only Scan Best Always preferred
Index Scan Excellent Small-medium result sets
Bitmap Heap Scan Good Medium result sets (PostgreSQL)
Sequential Scan Poor Only for small tables (<1000 rows) or full table queries
Table Scan Poor Only for small tables or unavoidable full scans

Database-Specific Optimizations

PostgreSQL-Specific Features

Partial Indexes (index subset of rows):

CREATE INDEX idx_active_users_login
ON users (last_login)
WHERE status = 'active';

Expression Indexes (index computed values):

CREATE INDEX idx_users_email_lower
ON users (LOWER(email));

Covering Indexes (avoid heap access):

CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name);

For comprehensive PostgreSQL optimization, see references/postgresql.md.

MySQL-Specific Features

Index Hints (override optimizer):

SELECT * FROM orders USE INDEX (idx_orders_customer)
WHERE customer_id = 123;

Storage Engine Selection:

  • InnoDB (default): Transactional, row-level locks, clustered primary key
  • MyISAM: Faster reads, no transactions, table-level locks

For comprehensive MySQL optimization, see references/mysql.md.

SQL Server-Specific Features

Query Store (track query performance over time):

ALTER DATABASE YourDatabase SET QUERY_STORE = ON;

Execution Plan Warnings:

  • Look for yellow exclamation marks in graphical execution plans
  • Thick arrows indicate high row counts

For comprehensive SQL Server optimization, see references/sqlserver.md.

Advanced Optimization Techniques

Common Table Expressions (CTEs)

Break complex queries into readable, maintainable parts:

WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, COUNT(*) as order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.order_count, 0) as orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ac.id = ro.customer_id;

EXISTS vs IN for Subqueries

Use EXISTS for better performance with large datasets:

-- ✅ Good: EXISTS stops at first match
SELECT * FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);

-- ❌ Less efficient: IN builds full list
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

Denormalization Decision Framework

Consider denormalization when:

  • Query joins 3+ tables frequently
  • Data is relatively static (infrequent updates)
  • Read performance is critical
  • Write overhead is acceptable

Denormalization Strategies:

  1. Duplicate columns: Copy foreign key data into main table
  2. Summary tables: Pre-aggregate data
  3. Materialized views: Database-maintained denormalized views
  4. Application caching: Redis/Memcached for frequently accessed data

Optimization Workflow Example

Scenario: API endpoint taking 2 seconds to load

Step 1: Identify Slow Query

Use APM/observability tools to identify database query causing delay

Step 2: Run EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

Step 3: Analyze Output

Seq Scan on orders (cost=0.00..2500.00 rows=10)
  Filter: (customer_id = 123)
  Rows Removed by Filter: 99990

Problem: Sequential scan filtering 99,990 rows

Step 4: Add Composite Index

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

Step 5: Verify Improvement

EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
Index Scan using idx_orders_customer_created (cost=0.42..12.44 rows=10)
  Index Cond: (customer_id = 123)

Result: 200x faster (2000ms → 10ms)

Monitoring and Maintenance

Regular Optimization Tasks:

  • Review slow query logs weekly
  • Update database statistics regularly (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server)
  • Monitor index usage (drop unused indexes)
  • Archive old data to keep tables manageable
  • Review execution plans for critical queries quarterly

PostgreSQL Statistics Update:

ANALYZE table_name;

MySQL Statistics Update:

ANALYZE TABLE table_name;

SQL Server Statistics Update:

UPDATE STATISTICS table_name;

Related Skills

  • databases-relational: Schema design and database fundamentals
  • observability: Performance monitoring and slow query detection
  • api-patterns: API-level optimization (pagination, caching)
  • performance-engineering: Application performance profiling

Additional Resources

For comprehensive documentation, reference these files:

  • references/explain-guide.md - Detailed EXPLAIN plan interpretation
  • references/scan-types.md - Scan type meanings and performance implications
  • references/indexing-decisions.md - When and how to add indexes
  • references/index-types.md - Database-specific index types
  • references/composite-indexes.md - Multi-column index design
  • references/anti-patterns.md - Common anti-patterns with solutions
  • references/efficient-patterns.md - Efficient query patterns
  • references/postgresql.md - PostgreSQL-specific optimizations
  • references/mysql.md - MySQL-specific optimizations
  • references/sqlserver.md - SQL Server-specific optimizations

For working SQL examples, see examples/ directory.

Weekly Installs
15
GitHub Stars
310
First Seen
Jan 25, 2026
Installed on
opencode14
gemini-cli14
github-copilot13
cursor13
codex12
claude-code10