mz-query-tracing

Installation
SKILL.md

Query Tracing Skill

Use distributed tracing (OpenTelemetry / Tempo) to understand where time is spent during SQL statement execution in a local Materialize instance.

For background on Materialize's tracing infrastructure (span levels, filtering, #[tracing::instrument], distributed context propagation), see doc/developer/tracing.md.

Important: Use Optimized Builds

Always trace --optimized builds. Debug builds introduce significant artifacts — unoptimized code, extra debug checks, and inflated self-times that don't reflect production behavior. Tracing a debug build can be misleading because time may appear in places that are negligible in release builds.

bin/environmentd --optimized --monitoring

The --optimized build takes longer to compile but produces traces that accurately represent where time is actually spent.

Prerequisites

The local monitoring stack (Tempo + Grafana + Prometheus) and environmentd with tracing enabled must be running. All network calls require dangerouslyDisableSandbox.

Important: Start the monitoring stack before environmentd. If Tempo is not running when environmentd starts sending spans, those spans are lost — they will not be retroactively captured.

1. Start the monitoring stack

bin/mzcompose --find monitoring run default

Wait for Tempo to be ready:

# Poll until Tempo is accepting requests
for i in $(seq 1 30); do
    curl -s -o /dev/null -w "%{http_code}" http://localhost:3200/ready | grep -q 200 && break
    sleep 5
done

2. Start environmentd with tracing

bin/environmentd --optimized --monitoring

Or with the trace filter pre-set:

bin/environmentd --optimized --reset --monitoring -- --system-parameter-default='opentelemetry_filter=debug'

3. Verify both services are running

Before tracing, confirm the stack is healthy:

# Tempo accepting traces?
curl -s -o /dev/null -w "%{http_code}" http://localhost:3200/ready
# environmentd listening?
lsof -tiTCP:6875 -sTCP:LISTEN

4. Set the trace filter at runtime (if not set at startup)

Connect as mz_system and set the OpenTelemetry filter:

psql -U mz_system -h localhost -p 6877 materialize -c "ALTER SYSTEM SET opentelemetry_filter = 'debug';"

debug is a good starting filter. For even more detail, use trace (but expect much more data). You can also target specific modules:

ALTER SYSTEM SET opentelemetry_filter = 'mz_adapter::coord=trace,debug';

Workflow: Trace a Query

Step 1: Run the query and capture the trace ID

Use psql with emit_trace_id_notice enabled. Important: Use separate -c flags for SET and the query, since DDL statements (CREATE, DROP, ALTER) cannot run inside an implicit transaction block:

psql -U materialize -h localhost -p 6875 materialize \
  -c "SET emit_trace_id_notice = true;" \
  -c "<YOUR SQL STATEMENT HERE>;"

The output will contain NOTICE lines with trace IDs — one per statement. The SET statement gets its own trace ID; the second trace ID is the one you want:

SET
NOTICE:  trace id: fd2f69eb059e4823d8d48a87ccab8f6f
NOTICE:  trace id: 984bd8c69f99703243d350e14b02caea
CREATE TABLE

The trace ID for the query is the last one before the statement result.

To extract the trace ID programmatically:

OUTPUT=$(psql -U materialize -h localhost -p 6875 materialize \
  -c "SET emit_trace_id_notice = true;" \
  -c "CREATE TABLE foo (id INT);" 2>&1)
TRACE_ID=$(echo "$OUTPUT" | grep "trace id:" | tail -1 | sed 's/.*trace id: //')
echo "Trace ID: $TRACE_ID"

Step 2: Wait for spans to be flushed

Spans are batched before export (default 5s delay, configured in --opentelemetry-sched-delay). Wait ~10 seconds after the query completes.

Step 3: Fetch the trace from Tempo

Query the Tempo HTTP API on port 3200. Retry if needed — spans may still be in flight:

sleep 10
# Retry loop — first attempt often gets "trace not found"
for i in $(seq 1 5); do
    RESP=$(curl -s http://localhost:3200/api/traces/$TRACE_ID)
    if echo "$RESP" | grep -q "batches"; then
        echo "$RESP" > /tmp/claude-1000/trace.json
        echo "Trace fetched ($(echo "$RESP" | wc -c) bytes)"
        break
    fi
    echo "Trace not ready, retrying... ($i/5)"
    sleep 5
done

Step 4: Analyze the trace

The Tempo API returns traces in OTLP JSON format. Important: Span IDs and parent span IDs are base64-encoded, not hex. The analysis script handles this.

Use the trace_tree.py script in this skill directory to analyze traces:

python3 .claude/skills/mz-query-tracing/trace_tree.py /tmp/claude-1000/trace.json "My Query"

Interpreting Results

  • Self-time is where actual work happens. A span with 100ms duration but 0ms self-time is just a wrapper — its children do all the work.
  • group_commit_apply::append_fut — Persist append operations, often dominant for DDL. Time here is spent writing to the durable log.
  • catalog::transact_inner — Catalog transaction processing. High self-time means catalog state manipulation.
  • oracle::write_ts / oracle::apply_write — Timestamp oracle calls to CockroachDB. Each is a round-trip.
  • consensus::compare_and_set / consensus::scan — Persist consensus operations (also CockroachDB round-trips).
  • coord::check_consistency — Post-DDL consistency check. Self-time here is pure computation.
  • Sequential vs. concurrent children — children with overlapping time ranges are concurrent; non-overlapping are sequential.
  • Code locations in [src/...] brackets let you jump directly to the source code.
  • For spans not listed above, read the source code at the location shown in the [src/...] bracket to understand what the span does and why it's taking time.

Tips

  • Use --optimized builds. Debug builds inflate self-times with unoptimized code paths and extra debug checks (e.g., check_consistency may appear much larger than in production). Always trace optimized builds for meaningful results.
  • Start monitoring before environmentd. Spans emitted before Tempo is ready are silently dropped. If you restart environmentd, make sure Tempo is still running.
  • If traces are empty or incomplete, the opentelemetry_filter may be too restrictive. Try debug or even trace.
  • The first fetch after a query often returns trace not found. The default batch delay is 5s, but ingestion adds latency. Always use a retry loop.
  • Tempo retains traces for only 15 minutes by default (configured in misc/monitoring/tempo.yml). Fetch traces promptly.
  • To search for recent traces: curl -s "http://localhost:3200/api/search?limit=10"
  • You can also view traces in Grafana at http://localhost:3000 by searching for the trace ID in the Tempo datasource explore view.
  • For comparing before/after a code change, trace the same query on both builds and compare the self-time rankings side by side.
Related skills
Installs
2
GitHub Stars
6.3K
First Seen
9 days ago