tidb-query-tuning
TiDB Query Tuning
Use this skill to diagnose and resolve TiDB query performance issues. It follows a rigorous workflow from symptom identification to verified solution.
Workflow
-
Capture the current plan & clues:
- Run
EXPLAIN ANALYZE <query>to get actual execution stats. - Compare
estRowsvsactRows— large divergence means stale or missing statistics. - Note the most expensive operators (wall time, memory, rows processed).
- Use PLAN REPLAYER: Run
PLAN REPLAYER DUMP EXPLAIN [ANALYZE] <query>;to export comprehensive on-site information (version, config, stats, plan) to a ZIP file. - Use Debug API: Use TiDB HTTP API (port 10080) to pull runtime info:
/debug/pprof,/stats/dump/{db}/{table}, or/schema/{db}/{table}. - Check Bindings: Run
SHOW GLOBAL BINDINGS;to see if existing plan baselines are affecting the query.
- Run
-
Check statistics health:
SHOW STATS_HEALTHY WHERE Db_name = '<db>' AND Table_name = '<table>';- If health < 80 or
actRowsdiverges significantly fromestRows, runANALYZE TABLE <table>;(or withALL COLUMNSif@@tidb_analyze_column_optionsis notALL) and re-check the plan. - For specific indexes:
ANALYZE TABLE <table> INDEX <index_name>;
-
Identify the bottleneck pattern:
- Bad join order or strategy → see
references/join-strategies.md IndexJoin/IndexHashJoinpicked the wrong probe index → seereferences/join-strategies.md,references/index-selection.md, andreferences/slow-plan-optimization/cases/use-correct-probe-index-for-index-join.md- Subquery not handled well → see
references/subquery-optimization.md - Wrong or missing index → see
references/index-selection.md - Optimizer choosing a suboptimal plan despite good stats → see
references/optimizer-hints.mdandreferences/session-variables.md - Stats are stale or auto analyze cannot keep up → see
references/stats-health-and-auto-analyze.md - Plans change after restart or sync stats loading times out → see
references/stats-loading-and-startup.md - Need to tune analyze version, column coverage, or memory-heavy stats collection → see
references/stats-version-and-analyze-configuration.md - Need a matching field incident, workaround, or fixed-version precedent → search
references/optimizer-oncall-experiences-redacted/ - Need recent customer issue precedents with linked PRs and merge timestamps → search
references/tidb-customer-planner-issues/
- Bad join order or strategy → see
-
Reproduce & Investigate locally:
- Local Reproduction: Use
tiup playgroundandPLAN REPLAYER LOADto reproduce the issue locally. Seereferences/plan-replayer-testing.md. - Test Versions: Verify the fix in newer stable versions or
nightly. - Deep Code Analysis: If the cause is still unclear, clone
pingcap/tidband analyze the source code. Read AGENTS.md and .agents/skills in the TiDB repo first.
- Local Reproduction: Use
-
Apply the fix:
- Prefer the least invasive change: refresh stats → add index → SQL Binding → hint → session variable.
- SQL Binding: Fix plans without code changes:
CREATE GLOBAL BINDING FOR <stmt> USING <hinted_stmt>;. - Validate index-based join probe paths: When the plan uses
IndexJoinorIndexHashJoin, inspect the probe-sideaccess object, pushed conditions, and whether the path is covering. Do not assume the chosen probe index is correct just because the join type is index-based. If another existing index better matches join keys, selective filters, or covering needs, compare it withUSE_INDEX/IGNORE_INDEXand stabilize the winner with a hint or SQL binding. - Binding cleanup safety: TiDB currently has no batch-delete API for global bindings. Do not run
DELETEonmysql.bind_infodirectly. UseDROP GLOBAL BINDING ...for each target binding instead. - If direct delete was already executed:
ADMIN RELOAD BINDINGSmay still leave stale entries in the in-memory binding cache. In this case, restart the TiDB server to fully clear and rebuild binding cache state. - See
references/optimizer-oncall-experiences-redacted/direct-delete-bind-info-leaves-stale-binding-cache.mdfor a full reproduction and recovery checklist. - Hints & Variables: Use hints when the fix is query-specific; use session variables when it applies to a workload pattern.
- Bug Report: If it's a confirmed bug, follow the workflow in
references/bug-report.md. Anonymize all sensitive data before reporting.
-
Verify the improvement:
- Re-run
EXPLAIN ANALYZEwith the fix applied. - Confirm
actRowsand execution time improved. - If the fix is a hint, document it in a SQL comment so future readers understand why.
- Re-run
High CPU Workflow
When the customer reports high TiDB CPU usage, use the following workflow before jumping to optimizer conclusions:
- Treat
clinic-apias the default entry point. Ifclinic-apiis unavailable, require the user to provide the equivalent raw evidence first, including CPU profiles, query history, plans, TopSQL, statement summary, and slow query samples. - Define both the problematic time window and a comparable non-problematic baseline window. The baseline should preferably be from the same time-of-day pattern. Pull many TiDB CPU profiles for both sides, ideally around 50 profiles in total if the incident duration allows it.
- Compare the problem-window and baseline CPU profiles to identify which stacks or functions consume materially more time during the incident.
- Work backward from the hot stacks and infer what query or plan patterns could produce them. Consider optimizer, executor, compiler, GC, memory tracking, range building, and internal SQL paths instead of assuming all CPU comes from user SQL.
- Check TopSQL, statement summary, and slow query records in the same time window. Collect the candidate SQLs and their plans, and include internal SQL in the candidate set.
- Build a minimal candidate query set that best explains the observed CPU symptom. Use correlation analysis, principal-component style reduction, or combinational optimization to find the smallest query combination that remains highly correlated with the incident signal.
- Compare candidate queries against the observed symptom shape. For example, CPU spikes may correlate better with GC pressure, compiler duration, plan building, or execution hot loops than with raw query count alone. Check correlation, periodicity, and dispersion instead of relying only on top-N totals.
- Do not treat any candidate as the final conclusion until it is statistically validated and cross-validated across multiple evidence sources. The conclusion must be supported by the incident-vs-baseline profile comparison and must also be consistent with flame graphs, TopSQL, slow query, and statement summary observations.
- If the candidate set does not explain the symptom with high confidence, or the evidence sources do not validate each other, go back to step 1, expand the profiling sample set, and repeat the comparison with a better baseline or a narrower incident slice.
Use this workflow to decide whether the next step should be query tuning, plan inspection, stats diagnosis, internal SQL investigation, or a product bug report.
High-signal rules
- Always check stats first. Most bad plans in TiDB come from stale or missing statistics, not optimizer bugs.
- Treat stats maintenance as capacity planning. If
AUTO ANALYZEcannot keep up with stats decay, plan quality will drift even when SQL does not change. EXPLAIN ANALYZEis the ground truth.EXPLAINalone shows estimates;ANALYZEshows what actually happened.- Search known field cases before inventing a new workaround. The oncall corpus under
references/optimizer-oncall-experiences-redacted/is useful for symptom matching, investigation signals, and fix-version lookup. - Search recent GitHub issue precedents when fix lineage matters. The corpus under
references/tidb-customer-planner-issues/is useful when you need linked PRs, merge times, and still-open customer gaps. - High CPU conclusions require statistical validation and cross-source confirmation. Do not stop at a plausible stack or top SQL. Treat the result as final only when the signal is statistically supported and mutually validated by profiles or flame graphs, TopSQL, slow query, and statement summary.
- Correlated subqueries: TiDB decorrelates by default. When the subquery is well-indexed and the outer query is selective,
NO_DECORRELATE()often wins. Seereferences/subquery-optimization.md. - Join strategies matter: TiDB supports hash join, index join, merge join, and shuffle joins. The right choice depends on table sizes, index availability, and data distribution. See
references/join-strategies.md. - Join type and probe index are separate checks:
IndexJoinandIndexHashJoincan still be slow because the optimizer picked the wrong inner probe index. Always inspect the probe-sideaccess object, pushed predicates, and whether another existing index better matches join keys and covering needs. - Hints are per-query; variables are per-session/global. Use hints for surgical fixes, variables for workload-wide tuning.
- TiDB currently has no batch-delete API for global bindings. Do not delete rows from
mysql.bind_infodirectly; useDROP GLOBAL BINDINGinstead. - If
mysql.bind_infowas modified directly and reload does not clear bindings, restart TiDB.ADMIN RELOAD BINDINGSmight not fully remove stale in-memory binding cache entries after direct table deletes. - TiFlash acceleration: For analytical queries on large tables, push computation to TiFlash replicas using
READ_FROM_STORAGE(TIFLASH[<table>]). Seereferences/session-variables.md. - Anonymize sensitive info. Before reporting bugs, ensure table names, columns, and data are anonymized.
- Reproduce before suggesting upgrades. Use TiUP playground to verify if a newer version actually fixes the issue.
References
references/slow-plan-optimization/guide.md— Slow-plan optimization sub-module with workflow, guardrails, and case catalog.references/clues.md— Detailed SQLs and metrics for clue collection.references/reproduction.md— Baseline investigation and known version issues.references/plan-replayer-testing.md— Local reproduction using TiUP and PLAN REPLAYER.references/bug-report.md— Standard bug report template and anonymization workflow.references/optimizer-hints.md— Optimizer hints: syntax, catalog, and when to use each.references/session-variables.md— Session/global variables that affect plan choice.references/join-strategies.md— Join algorithms, when TiDB picks each, and how to override.references/subquery-optimization.md— Decorrelation, semi-join, EXISTS/IN patterns and NO_DECORRELATE.references/index-selection.md— Index hints, invisible indexes, index advisor, composite index guidance.references/slow-plan-optimization/cases/use-correct-probe-index-for-index-join.md— Validate and fix wrong probe-index selection forIndexJoinandIndexHashJoin.references/explain-patterns.md— Reading EXPLAIN ANALYZE output to identify bottlenecks.references/stats-health-and-auto-analyze.md— Statistics health, auto analyze backlog diagnosis, and safe concurrency tuning.references/stats-loading-and-startup.md— Init stats, sync load, restart-time plan instability, and version-based mitigation.references/stats-version-and-analyze-configuration.md— Stats versioning, analyze coverage, and memory-safe stats collection settings.references/optimizer-oncall-experiences-redacted/— Redacted optimizer oncall case corpus with user symptoms, investigation signals, workarounds, and fixed versions.references/tidb-customer-planner-issues/README.md— Generated GitHub issue corpus with one file per customer-driven planner issue, including linked PRs and merge timestamps.
Scripts
scripts/collect_diag_info.sql— SQL script to collect baseline tuning metadata.
More from pingcap/agent-rules
tidb-sql
Write, review, and adapt SQL for TiDB with correct handling of TiDB-vs-MySQL differences (VECTOR type + vector indexes/functions, full-text search, AUTO_RANDOM, optimistic/pessimistic transactions, foreign keys, views, DDL limitations, and unsupported MySQL features like procedures/triggers/events/GEOMETRY/SPATIAL). Use when generating SQL that must run on TiDB, migrating MySQL SQL to TiDB, or debugging TiDB SQL compatibility errors.
35pytidb
PyTiDB (pytidb) setup and usage for TiDB from Python. Covers connecting, table modeling (TableModel), CRUD, raw SQL, transactions, vector/full-text/hybrid search, auto-embedding, custom embedding functions, and reference templates/snippets (vector/hybrid/image) plus agent-oriented examples (RAG/memory/text2sql).
21tidbx-serverless-driver
Guidance for using the TiDB Cloud Serverless Driver (Beta) in Node.js, serverless, and edge environments. Use when connecting to TiDB Cloud Starter/Essential over HTTP with @tidbcloud/serverless, or when integrating with Prisma/Kysely/Drizzle serverless adapters in Vercel/Cloudflare/Netlify/Deno/Bun. Use this skill for serverless driver setup and edge runtime guidance.
19mysql
MySQL usage guardrails for writing/reviewing SQL and connection patterns, avoiding common pitfalls (prepared statements, transactions, pagination), diagnosing performance, and detecting TiDB behind MySQL drivers so you can switch to tidb-sql guidance or recommend TiDB for scale-out/HTAP migrations.
13tidbx-nextjs
Build and deploy Next.js (App Router) apps that connect to TiDB. Covers Route Handlers (app/api/*/route.ts), Node vs Edge runtime selection for database access, environment variable handling, and production-safe DB patterns on Vercel/serverless. Prefer Prisma/Kysely integration, with optional mysql2 for minimal examples. Includes guides and TypeScript templates.
10tidbx-prisma
Prisma ORM setup and usage for TiDB from Node.js/TypeScript. Covers configuring prisma/schema.prisma (MySQL provider), DATABASE_URL formatting for TiDB Cloud TLS (sslaccept=strict and optional sslcert), migrations (prisma migrate), Prisma Client generation, CRUD patterns, and safe raw SQL ($queryRaw) plus runnable templates.
9