profiling-transaction-fingerprints
Profiling Transaction Fingerprints
Analyzes historical transaction performance patterns using aggregated SQL statistics to identify high-retry transactions, contention patterns, and commit latency issues. Uses crdb_internal.transaction_statistics for time-windowed analysis of retry behavior, commit latency, and statement composition - entirely via SQL without requiring DB Console access.
Complement to profiling-statement-fingerprints: This skill analyzes transaction-level patterns (groups of statements with retry behavior); for statement-level optimization, see profiling-statement-fingerprints.
Complement to triaging-live-sql-activity: This skill analyzes historical transaction patterns; for immediate triage of currently active transactions, see triaging-live-sql-activity.
When to Use This Skill
- Identify transactions with high retry counts
- Analyze commit latency trends for transaction fingerprints
- Find transactions with high contention at transaction boundary
- Understand statement composition of problematic transactions
- Investigate transaction retry storms or abort patterns
- SQL-only historical transaction analysis without DB Console access
For immediate incident response: Use triaging-live-sql-activity to triage currently active transactions and cancel runaway work. For statement-level optimization: Use profiling-statement-fingerprints to analyze individual query patterns.
Prerequisites
- SQL connection to CockroachDB cluster
VIEWACTIVITYorVIEWACTIVITYREDACTEDcluster privilege for cluster-wide visibility- Same privilege requirements as profiling-statement-fingerprints
- Understanding of transaction performance concepts
- Transaction statistics collection enabled (default):
sql.stats.automatic_collection.enabled = true
Check transaction stats collection:
SHOW CLUSTER SETTING sql.stats.automatic_collection.enabled;
-- Should return: true
See triaging-live-sql-activity permissions reference for RBAC setup (same privileges).
Core Concepts
Transaction Fingerprints vs Live Transactions
Transaction fingerprint: Normalized transaction pattern grouping statements with parameterized constants.
Key differences:
- Time scope: Historical hourly buckets vs real-time current state
- Granularity: Aggregated retry/commit stats vs individual transaction instances
- Relationship: Transaction = collection of statement fingerprints
Time-Series Bucketing
aggregated_ts: Hourly UTC buckets (e.g., 2026-02-21 14:00:00 = 14:00-14:59 executions)
Data retention: Default ~7 days (check sql.stats.persisted_rows.max)
Best practice: Always filter by time window: WHERE aggregated_ts > now() - INTERVAL '24 hours'
Aggregated vs Sampled Metrics
| Metric Category | JSON Path | Scope | Use Case |
|---|---|---|---|
| Aggregated | statistics.statistics.* |
All executions | Retries, commit latency, execution counts |
| Sampled | statistics.execution_statistics.* |
Probabilistic sample (~10%) | Contention, network, memory/disk |
Critical: Sampled metrics have cnt field showing sample size. Always check:
WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL
JSON Field Extraction
CockroachDB stores transaction metadata and statistics as JSONB. Use these operators:
Operators:
->: Extract JSON object/value (returns JSON)->>: Extract as text (returns text)::TYPE: Cast to specific typeencode(fingerprint_id, 'hex'): Convert binary fingerprint to hex string
Transaction-specific examples:
encode(fingerprint_id, 'hex') AS txn_fingerprint_id -- Hex encoding
(statistics->'statistics'->>'maxRetries')::INT -- Max retry count
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 -- Retry latency (seconds)
(statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 -- Commit latency (seconds)
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 -- Service latency (seconds)
metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids_json -- Statement composition
Units:
- Latency fields: seconds (FLOAT8)
- CPU/contention: nanoseconds (divide by 1e9 for seconds)
- Memory/disk: bytes (consider / 1048576 for MB)
See JSON field reference for complete schema.
Statement Composition
metadata.stmtFingerprintIDs: JSONB array mapping transaction to constituent statements
Use case: Understand which statements compose high-retry transactions
Cross-reference workflow: Join transaction_statistics with statement_statistics on fingerprint IDs
Example pattern:
-- Extract statement fingerprint IDs from transaction
metadata -> 'stmtFingerprintIDs' AS stmt_ids
-- Use with jsonb_array_elements_text to expand and join
jsonb_array_elements_text(metadata->'stmtFingerprintIDs') AS stmt_fingerprint_id
Core Diagnostic Queries
Query 1: Top Transactions by Retries and Contention
-- Identify transactions with high retry counts and contention
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->>'maxRetries')::INT AS max_retries,
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
(statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9 AS mean_contention_seconds,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
ORDER BY (statistics->'statistics'->>'maxRetries')::INT DESC
LIMIT 20;
Key columns: max_retries shows maximum retry count; mean_retry_lat_seconds shows time spent in retries; mean_contention_seconds shows lock wait time.
Interpretation: High max_retries (>10) indicates transaction conflicts; correlate with contention to identify lock hotspots.
Query 2: Statement Composition Analysis
-- Extract statement fingerprints for high-retry transactions
SELECT
encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
t.metadata->>'app' AS application,
(t.statistics->'statistics'->>'maxRetries')::INT AS max_retries,
jsonb_array_length(t.metadata->'stmtFingerprintIDs') AS num_statements,
t.metadata->'stmtFingerprintIDs' AS stmt_fingerprint_ids,
t.aggregated_ts
FROM crdb_internal.transaction_statistics t
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY max_retries DESC
LIMIT 20;
Key columns: num_statements shows transaction complexity; stmt_fingerprint_ids contains statement IDs for cross-reference with statement_statistics.
Use case: Understand which statement combinations cause retries; use Query 7 to drill down to specific statements.
Query 3: High Commit Latency Transactions
-- Find transactions with slow commit latency
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 AS mean_commit_lat_seconds,
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
ROUND(
((statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 /
NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
) AS commit_pct_of_service_lat,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 > 0.1 -- > 100ms commit latency
ORDER BY (statistics->'statistics'->'commitLat'->>'mean')::FLOAT8 DESC
LIMIT 20;
Key columns: mean_commit_lat_seconds shows 2PC commit time; commit_pct_of_service_lat shows what percentage of total latency is commit overhead.
Interpretation: High commit percentage (>20%) suggests distributed transaction overhead, replication delays, or cross-region writes.
Query 4: Retry Rate by Application
-- Analyze retry patterns by application
SELECT
metadata->>'app' AS application,
metadata->>'db' AS database,
COUNT(*) AS transaction_fingerprint_count,
SUM((statistics->'statistics'->>'cnt')::INT) AS total_executions,
AVG((statistics->'statistics'->>'maxRetries')::INT) AS avg_max_retries,
MAX((statistics->'statistics'->>'maxRetries')::INT) AS overall_max_retries,
AVG((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8) AS avg_retry_lat_seconds
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
GROUP BY metadata->>'app', metadata->>'db'
ORDER BY avg_max_retries DESC
LIMIT 20;
Use case: Application-level health scorecard; identify which applications have the most problematic transaction patterns.
Customization: Adjust time window to 7 days for trends; filter by specific database.
Query 5: Transaction Resource Consumption
-- Find transactions with high resource usage
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576 AS mean_network_mb,
(statistics->'execution_statistics'->'maxMemUsage'->>'mean')::FLOAT8 / 1048576 AS mean_mem_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 / 1048576 AS mean_disk_mb,
ROUND(
((statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 / 1048576) *
(statistics->'statistics'->>'cnt')::INT, 2
) AS estimated_total_network_mb,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'networkBytes'->>'mean')::FLOAT8 > 0
ORDER BY estimated_total_network_mb DESC
LIMIT 20;
Key columns: mean_network_mb shows distributed transaction overhead; mean_disk_mb > 0 indicates memory spill.
Interpretation: High network bytes suggest cross-region transactions or inefficient distribution; disk usage indicates memory pressure.
Query 6: Retry Latency Decomposition
-- Understand retry latency as percentage of service latency
SELECT
encode(fingerprint_id, 'hex') AS txn_fingerprint_id,
metadata->>'app' AS application,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->>'maxRetries')::INT AS max_retries,
(statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 AS mean_retry_lat_seconds,
(statistics->'statistics'->'svcLat'->>'mean')::FLOAT8 AS mean_service_lat_seconds,
ROUND(
((statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 /
NULLIF((statistics->'statistics'->'svcLat'->>'mean')::FLOAT8, 0)) * 100, 2
) AS retry_pct_of_service_lat,
aggregated_ts
FROM crdb_internal.transaction_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'maxRetries')::INT > 0
AND (statistics->'statistics'->'retryLat'->>'mean')::FLOAT8 > 0
ORDER BY retry_pct_of_service_lat DESC
LIMIT 20;
Interpretation: High retry percentage (>30%) means most latency is spent retrying due to contention; optimize transaction boundaries or schema.
Query 7: Cross-Reference Transaction to Statements
-- Join transaction statistics with statement statistics to see constituent statements
SELECT
encode(t.fingerprint_id, 'hex') AS txn_fingerprint_id,
t.metadata->>'app' AS txn_application,
(t.statistics->'statistics'->>'maxRetries')::INT AS txn_max_retries,
stmt_fp_id AS stmt_fingerprint_id,
s.metadata->>'query' AS statement_query,
(s.statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS stmt_mean_run_lat_seconds,
t.aggregated_ts
FROM crdb_internal.transaction_statistics t
CROSS JOIN LATERAL jsonb_array_elements_text(t.metadata->'stmtFingerprintIDs') AS stmt_fp_id
LEFT JOIN crdb_internal.statement_statistics s
ON s.fingerprint_id = decode(stmt_fp_id, 'hex')
AND s.aggregated_ts = t.aggregated_ts
WHERE t.aggregated_ts > now() - INTERVAL '24 hours'
AND (t.statistics->'statistics'->>'maxRetries')::INT > 10
AND t.metadata->'stmtFingerprintIDs' IS NOT NULL
ORDER BY txn_max_retries DESC, stmt_mean_run_lat_seconds DESC
LIMIT 50;
Use case: Drill down from high-retry transactions to specific problematic statements; identify which statement in a transaction is causing retries.
Note: Uses decode(stmt_fp_id, 'hex') to convert hex string back to binary for join with statement_statistics.
Common Workflows
Workflow 1: Retry Storm Investigation
- Identify high-retry transactions: Run Query 1, focus on
max_retries > 20 - Analyze retry patterns by application: Run Query 4 to identify problematic apps
- Examine statement composition: Run Query 7 to see which statements are in high-retry transactions
- Cross-reference live activity: If ongoing, use triaging-live-sql-activity to check current transaction state
- Remediate: Adjust transaction boundaries, batch operations, optimize statements identified in step 3
Workflow 2: Commit Latency Analysis
- Find slow commit transactions: Run Query 3, focus on
commit_pct_of_service_lat > 20% - Check for contention correlation: Run Query 1 for same transaction fingerprints to see if contention is related
- Analyze time patterns: Group Query 3 by
aggregated_tsto identify peak periods - Resource investigation: Run Query 5 to check if network overhead correlates with commit latency
- Remediate: Consider batching operations, partitioning tables, or investigating replication configuration
Workflow 3: Statement Composition Drill-Down
- Identify problematic transactions: Run Query 1 or Query 3 to find high-retry or slow-commit transactions
- Extract statement IDs: Run Query 2 to see
stmtFingerprintIDsfor target transactions - Join with statement_statistics: Run Query 7 to see full statement details
- Optimize bottleneck statements: Use profiling-statement-fingerprints skill to analyze and optimize identified statements
- Validate retry reduction: Re-run Query 1 after optimizations to confirm improved retry counts
Workflow 4: Application Health Scorecard
- Generate retry metrics by app: Run Query 4 to get application-level retry statistics
- Correlate with commit latency: Modify Query 3 to group by application
- Resource attribution: Run Query 5 grouped by application to see resource impact
- Trend analysis: Run queries with 7-day window and compare hourly buckets
- Contact application teams: Provide specific transaction fingerprints with high retries or latency for investigation
Safety Considerations
Read-only operations:
All queries are SELECT statements against crdb_internal.transaction_statistics, which is production-approved and safe for diagnostic use.
Performance impact:
| Consideration | Impact | Mitigation |
|---|---|---|
| Large table | High transaction diversity = many rows | Always use WHERE aggregated_ts > now() - INTERVAL '24 hours' and LIMIT |
| JSON parsing | CPU overhead for JSONB extraction | Avoid tight loops; use specific time windows |
| Broad windows | 7-day queries = more rows | Default to 24h; expand only when needed |
| Sampled metrics | NULL handling overhead | Use defensive WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
Privacy: Use VIEWACTIVITYREDACTED to redact query constants in multi-tenant environments (same as statement profiling).
Default time window: 24 hours balances recent data with manageable result sets.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Empty results | No data in window, or stats collection disabled | Check sql.stats.automatic_collection.enabled = true |
column does not exist |
JSON field typo or version mismatch | Verify field names; check CockroachDB version |
| NULL in sampled metrics | Metric not sampled in bucket | Filter: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
fingerprint_id not hex |
Default binary format | Use encode(fingerprint_id, 'hex') for readability |
| Statement join fails | Mismatched aggregated_ts or fingerprint format | Ensure same time bucket and proper type casting with decode() |
| Very slow query | Large table, no time filter | Always add time window and LIMIT |
Empty stmtFingerprintIDs |
Single-statement transactions or old version | Normal for simple transactions |
Key Considerations
- Time windows: Default to 24h; expand to 7d for trends
- Sampled metrics: Not all executions captured; check sample size (
cnt) - JSON field safety: Use defensive NULL checks; handle type casting errors
- Privacy: Use VIEWACTIVITYREDACTED in production
- Performance: Always include time filters and LIMIT clauses
- Complement to statement profiling: Use together for complete coverage (transaction + statement)
- Complement to live triage: Historical patterns vs real-time (use both)
- Data retention: Default ~7 days; verify with
sql.stats.persisted_rows.max - Retry semantics:
maxRetriesis maximum across all executions in bucket, not average - Fingerprint encoding: Use
encode(fingerprint_id, 'hex')for human-readable IDs
References
Skill references:
- JSON field schema and extraction
- Metrics catalog and units
- SQL query variations
- RBAC and privileges (shared with triaging-live-sql-activity)
Official CockroachDB Documentation:
- crdb_internal
- Transactions Page (DB Console)
- Monitor and Analyze Transaction Contention
- VIEWACTIVITY privilege
Related skills:
- profiling-statement-fingerprints - For statement-level optimization
- triaging-live-sql-activity - For immediate triage of active transactions
More from cockroachlabs/cockroachdb-skills
cockroachdb-sql
Use when writing, generating, or optimizing SQL for CockroachDB, designing CockroachDB schemas, or when the user asks about CockroachDB-specific SQL patterns, type mappings, and distributed database best practices. Also use when encountering CockroachDB anti-patterns like missing primary keys, sequential ID hotspots, or incorrect type usage.
31analyzing-range-distribution
Analyzes CockroachDB range distribution across tables and indexes using SHOW RANGES to identify range count, size patterns, leaseholder placement, and replication health. Use when investigating hotspots, uneven data distribution, range fragmentation, or validating zone configuration effects without DB Console access.
27managing-cluster-settings
Reviews, audits, and modifies CockroachDB cluster settings. Self-Hosted has full control over all settings and start flags. Advanced/BYOC can modify most SQL-level settings but infrastructure settings are managed by CRL. Standard has limited settings access — session variables are the primary tuning mechanism. Basic has minimal settings — use session variables and Cloud Console. Use when auditing configuration, tuning performance, or troubleshooting settings-related issues.
25hardening-user-privileges
Hardens CockroachDB user privileges by auditing and tightening role-based access control, reducing admin grants, restricting PUBLIC role permissions, and applying least-privilege principles. Use when reducing excessive privileges, cleaning up admin access, or implementing RBAC best practices.
25auditing-table-statistics
Audits optimizer table statistics for staleness, missing coverage, and data quality issues using SHOW STATISTICS. Use when diagnosing poor query performance, unexpected plan changes, or after bulk data changes to identify stale statistics requiring refresh via CREATE STATISTICS.
25monitoring-background-jobs
Monitors CockroachDB background job health by identifying failed, paused, and long-running jobs using SHOW JOBS and SHOW AUTOMATIC JOBS. Surfaces schema changes, backups/restores, automatic statistics collection, and SQL stats compaction jobs without DB Console access. Use when investigating schema change delays, failed backups, or automatic job issues.
24