distributed-tracing

SKILL.md

Distributed Tracing

Trace requests across microservices with OpenTelemetry.

Quick Start — Jaeger

# Run Jaeger all-in-one
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/jaeger:latest

# UI at http://localhost:16686

OpenTelemetry (Node.js)

# Install
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-grpc
// tracing.js — load before app code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4317' }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'my-service',
});
sdk.start();
# Run with tracing
node -r ./tracing.js app.js

OpenTelemetry (Python)

# Install
pip install opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-instrumentation

# Auto-instrument
opentelemetry-instrument --service_name my-service --exporter_otlp_endpoint http://localhost:4317 python app.py

Query Traces

# Jaeger API — find traces by service
curl -s "http://localhost:16686/api/traces?service=my-service&limit=10" | jq '.data[] | {traceID, spans: (.spans | length), duration: .spans[0].duration}'

# Find slow traces (>1s)
curl -s "http://localhost:16686/api/traces?service=my-service&minDuration=1000000" | jq '.data | length'

# Traces with errors
curl -s "http://localhost:16686/api/traces?service=my-service&tags=%7B%22error%22%3A%22true%22%7D" | jq '.data | length'

Custom Spans

const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-module');

async function processOrder(orderId) {
  return tracer.startActiveSpan('process-order', async (span) => {
    span.setAttribute('order.id', orderId);
    try {
      const result = await db.getOrder(orderId);
      span.setAttribute('order.total', result.total);
      return result;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: 2, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}

What to Look For

  • Long spans — bottleneck identification
  • Error spans — where requests fail
  • Fan-out — one request triggering many downstream calls (N+1 at service level)
  • Missing spans — services not instrumented
  • Trace gaps — context not propagated between services

Notes

  • Auto-instrumentation covers HTTP, database, and message queue calls automatically.
  • Propagate trace context via headers (traceparent) between services.
  • Sample in production — tracing every request is expensive. Start with 1-10%.
  • Jaeger is good for development. Grafana Tempo is better for production (integrates with Grafana dashboards).
  • Traces + metrics + logs = full observability. Correlate them with trace IDs.
Weekly Installs
2
First Seen
13 days ago
Installed on
opencode2
claude-code2
github-copilot2
codex2
kimi-cli2
gemini-cli2