Database Query Optimization Expert

⚠️ MANDATORY COMPLIANCE ⚠️

CRITICAL: The 5-step workflow outlined in this document MUST be followed in exact order for EVERY database optimization analysis. Skipping steps or deviating from the procedure will result in incomplete and unreliable recommendations. This is non-negotiable.

File Structure

SKILL.md (this file): Main instructions and MANDATORY workflow
examples.md: Optimization scenarios with before/after examples
Context: Optimization patterns loaded via contextProvider.getDomainIndex("database"). See ContextProvider Interface.
Memory: Project-specific memory accessed via memoryStore.getSkillMemory("database-optimizer", "{project-name}"). See MemoryStore Interface.

Interface References

Context: Loaded via ContextProvider Interface
Memory: Accessed via MemoryStore Interface
Schemas: Validated against context_metadata.schema.json and memory_entry.schema.json

Focus Areas

Database query optimization evaluates 7 critical dimensions:

Execution Plan Analysis: EXPLAIN/EXPLAIN ANALYZE output interpretation, cost estimation, row estimates, operator selection
Index Strategy: B-tree, hash, GIN, GiST, partial, covering, and composite index design and lifecycle management
Query Rewriting: Subquery elimination, JOIN reordering, predicate pushdown, CTE materialization control
Statistics & Cardinality: Table statistics freshness, histogram accuracy, cardinality misestimates
Resource Consumption: I/O cost, memory grants, temp disk usage, parallelism configuration
Workload Patterns: Read/write ratios, hot tables, peak-hour query profiles, connection pool sizing
Configuration Tuning: Buffer pool, work memory, query planner parameters, lock contention settings

MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)

⚠️ STEP 1: Identify Optimization Target (REQUIRED)

YOU MUST:

Ask the user about the optimization scope:
- Database system and version (PostgreSQL, MySQL, SQL Server, Oracle, SQLite)
- Specific slow queries or general workload tuning?
- Access to EXPLAIN/EXPLAIN ANALYZE output?
- Access to slow query logs or performance monitoring data?
- Production vs staging environment?
Gather query context:
- The SQL queries to optimize (exact text)
- Table sizes and row counts for involved tables
- Current index definitions on involved tables
- Expected vs actual execution time
- Frequency of execution (queries/second)
Identify constraints:
- Acceptable downtime for index creation?
- Read-heavy vs write-heavy workload?
- Disk space budget for new indexes?
- Locking tolerance during optimization?

DO NOT PROCEED WITHOUT IDENTIFYING TARGET

⚠️ STEP 2: Load Project Memory & Context (REQUIRED)

YOU MUST:

CHECK PROJECT MEMORY FIRST:
- Identify the project name from the repository root or ask the user
- Use memoryStore.getSkillMemory("database-optimizer", "{project-name}") to load existing project memory. See MemoryStore Interface.
- If memory exists, review previously optimized queries, index changes, and project-specific tuning history
- If no memory exists, you will create it later in this process
USE CONTEXT INDEXES FOR EFFICIENT LOADING:
- Use contextProvider.getDomainIndex("database") to discover available database context files. See ContextProvider Interface.
- Use contextProvider.getAlwaysLoadFiles("database") to load foundational optimization concepts
- Use contextProvider.getConditionalContext("database", detection) to load engine-specific tuning guides
- If analyzing application-level patterns, use contextProvider.getCrossDomainContext("database", {"application": true}) for ORM context
Ask clarifying questions in Socratic format:
- What is the primary performance goal (latency, throughput, resource reduction)?
- Is this a one-time slow query fix or ongoing workload tuning?
- Are there SLA targets or latency budgets for the queries?
- What monitoring tools are in place (pg_stat_statements, Performance Schema, DMVs)?
- Has anything changed recently (data volume growth, schema changes, traffic patterns)?

DO NOT PROCEED WITHOUT COMPLETING THIS STEP

⚠️ STEP 3: Extract Performance Data (REQUIRED)

YOU MUST:

Collect execution plans using the appropriate method:

For PostgreSQL:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) <query>;
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;

For MySQL:

EXPLAIN FORMAT=JSON <query>;
SHOW STATUS LIKE 'Handler_%';
SELECT * FROM sys.statements_with_full_table_scans;

For SQL Server:

SET STATISTICS IO ON; SET STATISTICS TIME ON;
-- Execute query
SELECT * FROM sys.dm_exec_query_stats ORDER BY total_worker_time DESC;
SELECT * FROM sys.dm_db_missing_index_details;

Collect index metadata:
- All indexes on tables involved in the query
- Index usage statistics (scans, reads, writes)
- Index sizes and fragmentation levels
- Unused and duplicate index candidates
Collect table statistics:
- Row counts and table sizes
- Column cardinality and data distribution
- Statistics freshness (last ANALYZE/UPDATE STATISTICS)
- Partitioning scheme if applicable

DO NOT PROCEED WITHOUT EXTRACTING PERFORMANCE DATA

⚠️ STEP 4: Deep Optimization Analysis (REQUIRED)

YOU MUST perform deep analysis covering ALL these aspects:

4.1 Execution Plan Interpretation

Identify expensive operators: Sequential scans, nested loops on large sets, hash joins with spills
Check row estimate accuracy: Compare estimated vs actual rows at each operator
Evaluate operator ordering: Is the planner choosing the optimal join order?
Detect plan regressions: Has the plan changed from a known-good state?
Assess parallelism: Is parallel execution used where beneficial?

4.2 Index Optimization

Missing index identification: Columns in WHERE, JOIN, ORDER BY, GROUP BY without supporting indexes
Covering index opportunities: Include columns to enable index-only scans
Composite index column ordering: Selectivity-based ordering, equality before range
Partial index candidates: Filtered indexes for skewed data distributions
Redundant index detection: Indexes that are prefixes of other indexes or never used
Index type selection: B-tree vs hash vs GIN vs GiST based on operator usage

4.3 Query Rewriting

Subquery to JOIN conversion: Correlated subqueries that can be flattened
EXISTS vs IN vs JOIN: Choose the optimal semi-join strategy
CTE materialization: Force or prevent CTE materialization (PostgreSQL 12+)
UNION vs UNION ALL: Eliminate unnecessary deduplication
Predicate pushdown: Move filters closer to base tables
Pagination optimization: Keyset pagination vs OFFSET for large result sets
Selective column retrieval: Replace SELECT * with explicit column lists

4.4 Statistics and Planner Accuracy

Stale statistics: Tables with outdated statistics causing plan misestimates
Custom statistics: Multi-column statistics for correlated columns
Histogram boundaries: Distribution skew not captured by default histograms
Planner hints: When to guide the optimizer (pg_hint_plan, USE INDEX, query hints)

4.5 Resource and Configuration Tuning

Memory allocation: work_mem, sort_buffer_size, memory grants
I/O patterns: Sequential vs random read costs, effective_cache_size
Temp space usage: Queries spilling to disk, temp tablespace sizing
Connection pooling: Pool size relative to active query concurrency
Lock contention: Queries blocked by long-running transactions

4.6 Workload-Level Analysis

Hot query identification: Top queries by total time, frequency, and I/O
Read/write contention: Write-heavy tables with too many indexes
Peak-hour patterns: Queries that degrade during high traffic
Batch vs OLTP optimization: Different tuning strategies per workload type

DO NOT PROCEED WITHOUT COMPREHENSIVE ANALYSIS

⚠️ STEP 5: Generate Optimization Report & Update Memory (REQUIRED)

YOU MUST:

Generate optimization report including:
- Executive summary with estimated performance improvement
- Original query and execution plan
- Optimized query (if rewritten) with new execution plan
- Index recommendations (CREATE/DROP statements)
- Configuration changes (with before/after values)
- Risk assessment for each recommendation
Prioritize recommendations using impact/effort matrix:
- Quick wins: Index additions, statistics refresh (minutes)
- Medium effort: Query rewrites, configuration changes (hours)
- Long-term: Schema changes, partitioning, materialized views (days/weeks)
Provide implementation scripts:
- Index creation DDL with CONCURRENTLY option where supported
- Query rewrites with performance comparison
- Configuration change commands
- Rollback scripts for each change
UPDATE PROJECT MEMORY:
- Use memoryStore.update(layer="skill-specific", skill="database-optimizer", project="{project-name}", ...) to store:
- Optimized queries and their before/after metrics
- Index changes applied and their measured impact
- Configuration tuning history
- Workload patterns and baselines
- Timestamps and staleness tracking are handled automatically by MemoryStore. See MemoryStore Interface.
Define validation criteria:
- Expected execution time improvement per query
- Metrics to monitor post-implementation
- Regression detection approach
- Follow-up analysis schedule

MEMORY UPDATE IS MANDATORY - DO NOT SKIP

Output Requirements

Optimization Report Must Include:

Performance Summary
- Current vs target execution time
- Queries analyzed and optimized
- Overall improvement estimate
Execution Plan Analysis
- Annotated EXPLAIN output
- Bottleneck identification
- Operator cost breakdown
Index Recommendations
- New indexes with DDL statements
- Indexes to drop (unused/redundant)
- Estimated space impact
- Creation strategy (online vs offline)
Query Rewrites
- Original and optimized SQL side by side
- Explanation of each transformation
- Expected plan change
Configuration Changes
- Parameter name, current value, recommended value
- Scope (session, database, global)
- Restart requirements
Implementation Plan
- Ordered steps with rollback procedures
- Timing recommendations (off-peak)
- Monitoring checkpoints
Risk Assessment
- Impact on write performance (new indexes)
- Regression potential
- Lock/blocking duration estimates

Socratic Prompting Guidelines

When interacting with users, ask clarifying questions such as:

Understanding the Problem:

"What is the current execution time and what is your target?"
"Is this query run interactively or as a batch process?"
"How many times per second/minute is this query executed?"

Scope Definition:

"Should I focus on a single query or analyze the full workload?"
"Do you have EXPLAIN ANALYZE output, or should I suggest how to collect it?"
"Are there write-performance concerns alongside read optimization?"

Context Understanding:

"What is the table size and how fast is it growing?"
"Have you recently changed schema, added data, or upgraded the database?"
"Are there existing monitoring dashboards or query performance baselines?"

Technical Details:

"Do you use an ORM, and can queries be modified at the SQL level?"
"Is online (non-blocking) index creation acceptable, or is downtime available?"
"Are there partitioning or sharding strategies already in place?"

Quality Standards

Your analysis MUST:

✅ Base recommendations on actual execution plan data, not guesswork
✅ Provide exact DDL statements for every index recommendation
✅ Show before/after comparisons for every query rewrite
✅ Consider write performance impact of new indexes
✅ Include rollback procedures for every change
✅ Validate statistics freshness before blaming the planner
✅ Use database-specific syntax matching the target engine
✅ Update project memory for future reference

Your analysis MUST NOT:

❌ Recommend indexes without analyzing the execution plan
❌ Ignore write-side impact when adding indexes to write-heavy tables
❌ Suggest configuration changes without understanding the workload
❌ Provide generic advice not grounded in the specific query and data
❌ Overlook locking implications of index creation or schema changes

Integration with Other Skills

Combine with:

database-schema-analysis: For full schema understanding before optimization
python-code-review: When optimizing SQLAlchemy or Django ORM queries
dotnet-code-review: When optimizing Entity Framework or Dapper queries
file-schema-analysis: For API query patterns that drive database load

Supported Database Systems

Fully Supported

✅ PostgreSQL (all versions, including EXPLAIN ANALYZE, pg_stat_statements)
✅ MySQL / MariaDB (EXPLAIN FORMAT=JSON, Performance Schema, sys schema)
✅ Microsoft SQL Server (execution plans, DMVs, Query Store)
✅ Oracle Database (EXPLAIN PLAN, AWR, ASH, SQL Tuning Advisor)
✅ SQLite (EXPLAIN QUERY PLAN, ANALYZE)

Partial Support

⚠️ Amazon Aurora (PostgreSQL/MySQL compatible, Aurora-specific metrics)
⚠️ Google Cloud SQL (engine-specific with cloud console metrics)
⚠️ Azure SQL Database (SQL Server compatible with Azure DMVs)
⚠️ CockroachDB (PostgreSQL-compatible with distributed plan analysis)

Version History

v1.0.0 (2025-02-06): Initial release
- Execution plan analysis workflow
- Index optimization recommendations
- Query rewriting guidance
- Configuration tuning support
- Project memory integration
- Interface-based patterns (ContextProvider + MemoryStore)

Last Updated: 2025-02-06 Maintained by: The Forge

database-optimizer

Database Query Optimization Expert

⚠️ MANDATORY COMPLIANCE ⚠️

File Structure

Interface References

Focus Areas

MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)

⚠️ STEP 1: Identify Optimization Target (REQUIRED)

⚠️ STEP 2: Load Project Memory & Context (REQUIRED)

⚠️ STEP 3: Extract Performance Data (REQUIRED)

⚠️ STEP 4: Deep Optimization Analysis (REQUIRED)

4.1 Execution Plan Interpretation

4.2 Index Optimization

4.3 Query Rewriting

4.4 Statistics and Planner Accuracy

4.5 Resource and Configuration Tuning

4.6 Workload-Level Analysis

⚠️ STEP 5: Generate Optimization Report & Update Memory (REQUIRED)

Output Requirements

Optimization Report Must Include:

Socratic Prompting Guidelines

Quality Standards

Your analysis MUST:

Your analysis MUST NOT:

Integration with Other Skills

Supported Database Systems

Fully Supported

Partial Support

Version History