profiling-statement-fingerprints
Profiling Statement Fingerprints
Analyzes historical statement performance patterns using aggregated SQL statistics to identify slow, resource-intensive, or error-prone query fingerprints. Uses crdb_internal.statement_statistics for time-windowed analysis of latency, CPU, contention, admission delays, and failure rates - entirely via SQL without requiring DB Console access.
Complement to triaging-live-sql-activity: This skill analyzes historical patterns; for immediate triage of currently running queries, see triaging-live-sql-activity.
When to Use This Skill
- Identify slowest statement fingerprints over past hours/days/weeks
- Find queries with high CPU consumption, contention, or admission waits
- Investigate performance regressions or plan changes
- Locate full table scans or missing indexes via index recommendations
- Analyze resource consumption by application or database
- SQL-only historical analysis without DB Console access
For immediate incident response: Use triaging-live-sql-activity to triage currently running queries and cancel runaway work. For transaction-level analysis: Use profiling-transaction-fingerprints to analyze retry patterns, commit latency, and statement composition at the transaction boundary. For background job monitoring: Use monitoring-background-jobs for long-running schema changes and automatic jobs excluded from statement statistics.
Prerequisites
- SQL connection to CockroachDB cluster
VIEWACTIVITYorVIEWACTIVITYREDACTEDcluster privilege for cluster-wide visibility- Statement statistics collection enabled (default):
sql.stats.automatic_collection.enabled = true
Check collection status:
SHOW CLUSTER SETTING sql.stats.automatic_collection.enabled; -- Should return: true
See triaging-live-sql-activity permissions reference for RBAC setup (same privileges).
Core Concepts
Statement Fingerprints vs Live Queries
Statement fingerprint: Normalized SQL pattern with parameterized constants (e.g., SELECT * FROM users WHERE id = $1 vs SELECT * FROM users WHERE id = 123)
Key differences:
- Time scope: Historical hourly buckets vs real-time current state
- Granularity: Aggregated pattern statistics vs individual execution instances
Time-Series Bucketing
aggregated_ts: Hourly UTC buckets (e.g., 2026-02-21 14:00:00 = 14:00-14:59 executions)
Data retention: Default ~7 days (check sql.stats.persisted_rows.max)
Best practice: Always filter by time window: WHERE aggregated_ts > now() - INTERVAL '24 hours'
Aggregated vs Sampled Metrics
| Metric Category | JSON Path | Scope | Use Case |
|---|---|---|---|
| Aggregated | statistics.statistics.* |
All executions | Latency, row counts, execution counts |
| Sampled | statistics.execution_statistics.* |
~10% sample | CPU, contention, admission wait, memory/disk |
Critical: Always check sampled metrics presence: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL
JSON Field Extraction
Operators:
->: Extract JSON object (returns JSON)->>: Extract as text (returns text)::TYPE: Cast to specific type
Examples:
metadata->>'db' -- Database name
(statistics->'statistics'->>'cnt')::INT -- Execution count
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 -- Mean latency (seconds)
(statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9 -- CPU (convert nanos to seconds)
Units: Latency = seconds, CPU/admission = nanoseconds (÷ 1e9), Memory/disk = bytes (÷ 1048576 for MB)
See JSON field reference for complete schema.
Core Diagnostic Queries
Query 1: Top Statements by Mean Run Latency
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS mean_run_lat_seconds,
(statistics->'statistics'->'runLat'->>'max')::FLOAT8 AS max_run_lat_seconds,
(metadata->>'fullScan')::BOOL AS full_scan,
metadata->'index_recommendations' AS index_recommendations,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->'runLat'->>'mean')::FLOAT8 > 1.0 -- > 1 second mean latency
ORDER BY (statistics->'statistics'->'runLat'->>'mean')::FLOAT8 DESC
LIMIT 20;
Focus: Slowest queries; check full_scan and index_recommendations for optimization opportunities.
Query 2: Admission Control Impact
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 / 1e9 AS mean_admission_wait_seconds,
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS mean_run_lat_seconds,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 > 0
ORDER BY (statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: High admission wait = cluster at resource limits (CPU, memory, I/O). Ratio > 1.0 (wait > runtime) indicates severe queueing.
Query 3: Plan Hash Diversity
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
COUNT(DISTINCT plan_hash) AS distinct_plan_count,
array_agg(DISTINCT plan_hash ORDER BY plan_hash) AS plan_hashes,
SUM((statistics->'statistics'->>'cnt')::INT) AS total_executions
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '7 days'
GROUP BY fingerprint_id, metadata->>'db', metadata->>'query'
HAVING COUNT(DISTINCT plan_hash) > 1
ORDER BY COUNT(DISTINCT plan_hash) DESC, SUM((statistics->'statistics'->>'cnt')::INT) DESC
LIMIT 20;
Interpretation: Multiple plans indicate instability from schema changes, statistics updates, or routing changes. Performance can vary significantly between plans.
Query 4: High Contention Statements
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9 AS mean_contention_seconds,
ROUND(
((statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9) /
NULLIF((statistics->'statistics'->'runLat'->>'mean')::FLOAT8, 0) * 100, 2
) AS contention_pct_of_runtime,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 > 0
ORDER BY (statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: >20% contention = transaction conflicts, hot row access. Remediate with batching, transaction boundary changes, or schema redesign.
Query 5: High CPU Consumers
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9 AS mean_cpu_seconds,
(statistics->'statistics'->>'cnt')::INT AS total_executions,
ROUND(
((statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9) *
(statistics->'statistics'->>'cnt')::INT, 2
) AS estimated_total_cpu_seconds,
(metadata->>'fullScan')::BOOL AS full_scan,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 > 0
ORDER BY estimated_total_cpu_seconds DESC
LIMIT 20;
Focus: estimated_total_cpu_seconds shows cluster impact. High mean CPU often correlates with full_scan = true.
Query 6: Memory and Disk Spill Detection
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'execution_statistics'->'maxMemUsage'->>'mean')::FLOAT8 / 1048576 AS mean_mem_mb,
(statistics->'execution_statistics'->'maxMemUsage'->>'max')::FLOAT8 / 1048576 AS max_mem_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 / 1048576 AS mean_disk_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'max')::FLOAT8 / 1048576 AS max_disk_mb,
metadata->>'stmtType' AS statement_type,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 > 0 -- Has disk spills
ORDER BY (statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: Disk usage > 0 = memory spill (~100-1000x slower than in-memory). Common for large aggregations, sorts, hash joins. Fix with indexes or increased sql.distsql.temp_storage.workmem.
Query 7: Error-Prone Statements
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'statistics'->>'cnt')::INT AS total_executions,
COALESCE((statistics->'statistics'->>'failureCount')::INT, 0) AS failure_count,
ROUND(
COALESCE((statistics->'statistics'->>'failureCount')::INT, 0)::NUMERIC /
NULLIF((statistics->'statistics'->>'cnt')::INT, 0) * 100, 2
) AS failure_rate_pct,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'cnt')::INT > 10
AND COALESCE((statistics->'statistics'->>'failureCount')::INT, 0) > 0
ORDER BY failure_rate_pct DESC, failure_count DESC
LIMIT 20;
Common causes: Constraint violations, query timeouts, transaction retry errors (40001), permission denied.
Common Workflows
Workflow 1: Slowness Investigation
- Identify slow fingerprints: Run Query 1 with 24h window, focus on
mean_run_lat_seconds > 5and high execution counts - Check for full scans: Filter
full_scan = true, reviewindex_recommendations - Correlate to applications: Group by
metadata->>'app', contact teams with specific patterns - Cross-reference live activity: If ongoing, use triaging-live-sql-activity to cancel runaway queries
Workflow 2: Contention Analysis
- Find high-contention statements: Run Query 4, focus on
contention_pct_of_runtime > 20% - Check plan stability: Run Query 3 for contending fingerprints (plan changes affect lock order)
- Remediate: Batch operations, use
SELECT FOR UPDATE, partition hot tables, denormalize schema
Workflow 3: Admission Control Debugging
- Identify admission waits: Run Query 2, calculate wait ratio
- Correlate with CPU: Run Query 5 for same window, cross-reference fingerprint IDs
- Analyze time patterns: Group by
aggregated_tsto find peak periods - Triage: Short-term: spread batch jobs; Long-term: add capacity, optimize queries
Workflow 4: Memory Spill Investigation
- Find spilling statements: Run Query 6, focus on
max_disk_mb > 100 - Analyze patterns: Identify large
GROUP BY,ORDER BY, hash joins - Remediate: Add indexes, increase workmem (with caution), rewrite queries, use materialized views
Safety Considerations
Read-only operations: All queries are SELECT statements against production-approved crdb_internal.statement_statistics.
Performance impact:
| Consideration | Impact | Mitigation |
|---|---|---|
| Large table | Many rows with high statement diversity | Always use time filters and LIMIT |
| JSON parsing | CPU overhead | Use specific time windows, avoid tight loops |
| Broad windows | 7-day queries = more rows | Default to 24h; expand only when needed |
Privacy: Use VIEWACTIVITYREDACTED to redact query constants in multi-tenant environments.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Empty results | No data or stats collection disabled | Check sql.stats.automatic_collection.enabled = true |
column does not exist |
JSON field typo or version mismatch | Verify field names; check CockroachDB version |
| NULL in sampled metrics | Metric not sampled in bucket | Filter: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
Query text shows <hidden> |
Using VIEWACTIVITYREDACTED | Expected; use VIEWACTIVITY if authorized |
| "invalid input syntax for type json" | Malformed JSON path | Check operators: -> for JSON, ->> for text |
| Very slow query | Large table, no time filter | Always add time window and LIMIT |
Empty index_recommendations |
No recommendations or optimal | Normal if indexes exist |
Key Considerations
- Time windows: Default to 24h; expand to 7d for trends
- Sampled metrics: Not all executions captured; check sample size (
cnt) - JSON safety: Use defensive NULL checks; handle type casting errors
- Privacy: Use VIEWACTIVITYREDACTED in production
- Performance: Always include time filters and LIMIT
- Complement to live triage: Use together for complete coverage (historical + real-time)
- Data retention: Default ~7 days; verify with
sql.stats.persisted_rows.max - Plan instability: Multiple plan hashes indicate optimizer/schema changes
References
Skill references:
- JSON field schema and extraction
- Metrics catalog and units
- SQL query variations
- RBAC and privileges (shared with triaging-live-sql-activity)
Official CockroachDB Documentation:
- crdb_internal
- Statements Page (DB Console)
- Monitor and Analyze Transaction Contention
- VIEWACTIVITY privilege
Related skills:
- triaging-live-sql-activity - For immediate triage of currently running queries
- profiling-transaction-fingerprints - For transaction-level analysis including retry patterns and commit latency
More from cockroachlabs/cockroachdb-skills
cockroachdb-sql
Use when writing, generating, or optimizing SQL for CockroachDB, designing CockroachDB schemas, or when the user asks about CockroachDB-specific SQL patterns, type mappings, and distributed database best practices. Also use when encountering CockroachDB anti-patterns like missing primary keys, sequential ID hotspots, or incorrect type usage.
33analyzing-range-distribution
Analyzes CockroachDB range distribution across tables and indexes using SHOW RANGES to identify range count, size patterns, leaseholder placement, and replication health. Use when investigating hotspots, uneven data distribution, range fragmentation, or validating zone configuration effects without DB Console access.
29managing-cluster-settings
Reviews, audits, and modifies CockroachDB cluster settings. Self-Hosted has full control over all settings and start flags. Advanced/BYOC can modify most SQL-level settings but infrastructure settings are managed by CRL. Standard has limited settings access — session variables are the primary tuning mechanism. Basic has minimal settings — use session variables and Cloud Console. Use when auditing configuration, tuning performance, or troubleshooting settings-related issues.
27hardening-user-privileges
Hardens CockroachDB user privileges by auditing and tightening role-based access control, reducing admin grants, restricting PUBLIC role permissions, and applying least-privilege principles. Use when reducing excessive privileges, cleaning up admin access, or implementing RBAC best practices.
27auditing-table-statistics
Audits optimizer table statistics for staleness, missing coverage, and data quality issues using SHOW STATISTICS. Use when diagnosing poor query performance, unexpected plan changes, or after bulk data changes to identify stale statistics requiring refresh via CREATE STATISTICS.
27monitoring-background-jobs
Monitors CockroachDB background job health by identifying failed, paused, and long-running jobs using SHOW JOBS and SHOW AUTOMATIC JOBS. Surfaces schema changes, backups/restores, automatic statistics collection, and SQL stats compaction jobs without DB Console access. Use when investigating schema change delays, failed backups, or automatic job issues.
26