tidb-query-tuning
Installation
SKILL.md
TiDB Query Tuning
Use this skill to diagnose and resolve TiDB query performance issues. It follows a rigorous workflow from symptom identification to verified solution.
Workflow
-
Capture the current plan & clues:
- Run
EXPLAIN ANALYZE <query>to get actual execution stats. - Compare
estRowsvsactRows— large divergence means stale or missing statistics. - Note the most expensive operators (wall time, memory, rows processed).
- Use PLAN REPLAYER: Run
PLAN REPLAYER DUMP EXPLAIN [ANALYZE] <query>;to export comprehensive on-site information (version, config, stats, plan) to a ZIP file. - Use Debug API: Use TiDB HTTP API (port 10080) to pull runtime info:
/debug/pprof,/stats/dump/{db}/{table}, or/schema/{db}/{table}. - Check Bindings: Run
SHOW GLOBAL BINDINGS;to see if existing plan baselines are affecting the query.
- Run
-
Check statistics health:
SHOW STATS_HEALTHY WHERE Db_name = '<db>' AND Table_name = '<table>';- If health < 80 or
actRowsdiverges significantly fromestRows, runANALYZE TABLE <table>;(or withALL COLUMNSif@@tidb_analyze_column_optionsis notALL) and re-check the plan. - For specific indexes:
ANALYZE TABLE <table> INDEX <index_name>;
-
Identify the bottleneck pattern:
- Bad join order or strategy → see
references/join-strategies.md - Subquery not handled well → see
references/subquery-optimization.md - Wrong or missing index → see
references/index-selection.md - Optimizer choosing a suboptimal plan despite good stats → see
references/optimizer-hints.mdandreferences/session-variables.md - Stats are stale or auto analyze cannot keep up → see
references/stats-health-and-auto-analyze.md - Plans change after restart or sync stats loading times out → see
references/stats-loading-and-startup.md - Need to tune analyze version, column coverage, or memory-heavy stats collection → see
references/stats-version-and-analyze-configuration.md - Need a matching field incident, workaround, or fixed-version precedent → search
references/optimizer-oncall-experiences-redacted/ - Need recent customer issue precedents with linked PRs and merge timestamps → search
references/tidb-customer-planner-issues/
- Bad join order or strategy → see
-
Reproduce & Investigate locally:
- Local Reproduction: Use
tiup playgroundandPLAN REPLAYER LOADto reproduce the issue locally. Seereferences/plan-replayer-testing.md. - Test Versions: Verify the fix in newer stable versions or
nightly. - Deep Code Analysis: If the cause is still unclear, clone
pingcap/tidband analyze the source code. Read AGENTS.md and .agents/skills in the TiDB repo first.
- Local Reproduction: Use
-
Apply the fix:
- Prefer the least invasive change: refresh stats → add index → SQL Binding → hint → session variable.
- SQL Binding: Fix plans without code changes:
CREATE GLOBAL BINDING FOR <stmt> USING <hinted_stmt>;. - Hints & Variables: Use hints when the fix is query-specific; use session variables when it applies to a workload pattern.
- Bug Report: If it's a confirmed bug, follow the workflow in
references/bug-report.md. Anonymize all sensitive data before reporting.
-
Verify the improvement:
- Re-run
EXPLAIN ANALYZEwith the fix applied. - Confirm
actRowsand execution time improved. - If the fix is a hint, document it in a SQL comment so future readers understand why.
- Re-run
High-signal rules
- Always check stats first. Most bad plans in TiDB come from stale or missing statistics, not optimizer bugs.
- Treat stats maintenance as capacity planning. If
AUTO ANALYZEcannot keep up with stats decay, plan quality will drift even when SQL does not change. EXPLAIN ANALYZEis the ground truth.EXPLAINalone shows estimates;ANALYZEshows what actually happened.- Search known field cases before inventing a new workaround. The oncall corpus under
references/optimizer-oncall-experiences-redacted/is useful for symptom matching, investigation signals, and fix-version lookup. - Search recent GitHub issue precedents when fix lineage matters. The corpus under
references/tidb-customer-planner-issues/is useful when you need linked PRs, merge times, and still-open customer gaps. - Correlated subqueries: TiDB decorrelates by default. When the subquery is well-indexed and the outer query is selective,
NO_DECORRELATE()often wins. Seereferences/subquery-optimization.md. - Join strategies matter: TiDB supports hash join, index join, merge join, and shuffle joins. The right choice depends on table sizes, index availability, and data distribution. See
references/join-strategies.md. - Hints are per-query; variables are per-session/global. Use hints for surgical fixes, variables for workload-wide tuning.
- TiFlash acceleration: For analytical queries on large tables, push computation to TiFlash replicas using
READ_FROM_STORAGE(TIFLASH[<table>]). Seereferences/session-variables.md. - Anonymize sensitive info. Before reporting bugs, ensure table names, columns, and data are anonymized.
- Reproduce before suggesting upgrades. Use TiUP playground to verify if a newer version actually fixes the issue.
References
references/clues.md— Detailed SQLs and metrics for clue collection.references/reproduction.md— Baseline investigation and known version issues.references/plan-replayer-testing.md— Local reproduction using TiUP and PLAN REPLAYER.references/bug-report.md— Standard bug report template and anonymization workflow.references/optimizer-hints.md— Optimizer hints: syntax, catalog, and when to use each.references/session-variables.md— Session/global variables that affect plan choice.references/join-strategies.md— Join algorithms, when TiDB picks each, and how to override.references/subquery-optimization.md— Decorrelation, semi-join, EXISTS/IN patterns and NO_DECORRELATE.references/index-selection.md— Index hints, invisible indexes, index advisor, composite index guidance.references/explain-patterns.md— Reading EXPLAIN ANALYZE output to identify bottlenecks.references/stats-health-and-auto-analyze.md— Statistics health, auto analyze backlog diagnosis, and safe concurrency tuning.references/stats-loading-and-startup.md— Init stats, sync load, restart-time plan instability, and version-based mitigation.references/stats-version-and-analyze-configuration.md— Stats versioning, analyze coverage, and memory-safe stats collection settings.references/optimizer-oncall-experiences-redacted/— Redacted optimizer oncall case corpus with user symptoms, investigation signals, workarounds, and fixed versions.references/tidb-customer-planner-issues/README.md— Generated GitHub issue corpus with one file per customer-driven planner issue, including linked PRs and merge timestamps.
Scripts
scripts/collect_diag_info.sql— SQL script to collect baseline tuning metadata.