distributed-tracing-logs
SKILL.md
Distributed Tracing with Logs
Implement distributed tracing using logs by propagating trace context, creating span logs, using correlation IDs, and integrating with OpenTelemetry standards to enable end-to-end request tracing across distributed systems.
When to use me
Use this skill when:
- Building or maintaining distributed systems (microservices, serverless functions)
- Need to trace requests across multiple service boundaries
- Debugging issues that span multiple components or services
- Implementing observability for complex workflows
- Correlating logs from different services for a single user request
- Setting up OpenTelemetry or other tracing standards
- Analyzing latency and performance across service boundaries
- Implementing request context propagation
- Building audit trails for business transactions
What I do
1. Trace Context Propagation
- Generate trace and span IDs for request initiation
- Propagate context through HTTP headers across services
- Maintain context through async operations (queues, background jobs, callbacks)
- Handle context in batch processing and streaming systems
- Implement context extraction and injection middleware
- Manage sampling decisions for trace collection
2. Span Logging
- Create span start/end logs with timing information
- Log span attributes and events during execution
- Capture parent-child relationships between spans
- Record span status and errors for failed operations
- Include business context in span logs
- Implement span baggage for custom key-value propagation
3. Correlation & Context Management
- Generate correlation IDs for business transactions
- Link logs to traces through trace_id fields
- Maintain user/session context across service boundaries
- Propagate business identifiers (order_id, transaction_id, etc.)
- Handle context in distributed transactions
- Implement context storage and retrieval for long-running operations
4. OpenTelemetry Integration
- Implement OpenTelemetry SDKs for various languages
- Configure trace exporters (Jaeger, Zipkin, OTEL Collector, etc.)
- Set up automatic instrumentation for common frameworks
- Define custom spans and attributes for business logic
- Configure sampling strategies for production environments
- Integrate with existing logging infrastructure
5. Trace Analysis & Visualization
- Extract trace information from logs for analysis
- Calculate trace duration and latency across services
- Identify critical paths and bottlenecks
- Correlate traces with business metrics
- Create trace visualizations and dependency graphs
- Set up trace-based alerting for performance degradation
Trace Context Propagation
W3C Trace Context Standard
The W3C Trace Context specification defines standard HTTP headers for trace propagation:
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
Header format:
traceparent:00-{trace-id}-{span-id}-{trace-flags}tracestate: Vendor-specific trace state information
Propagation Methods
HTTP Headers (Synchronous calls)
GET /api/users HTTP/1.1
Host: api.example.com
Traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
X-Correlation-Id: tx-123456
X-Request-Id: req-789012
Message Queues (Asynchronous)
{
"headers": {
"traceparent": "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01",
"correlation_id": "tx-123456"
},
"body": {
"order_id": "ord-789",
"amount": 99.99
}
}
Database Operations
-- Include trace context in audit fields
INSERT INTO orders (id, amount, trace_id, span_id, created_at)
VALUES ('ord-789', 99.99, '0af7651916cd43dd8448eb211c80319c', 'b7ad6b7169203331', NOW());
Span Logging Patterns
Basic Span Logging
{
"timestamp": "2026-02-26T18:00:00Z",
"level": "INFO",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"span_name": "process_payment",
"span_kind": "SERVER",
"event": "span_start",
"duration_ms": 0,
"attributes": {
"order_id": "ord-789",
"payment_method": "credit_card",
"amount": 99.99
}
}
{
"timestamp": "2026-02-26T18:00:00.123Z",
"level": "INFO",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"span_name": "process_payment",
"span_kind": "SERVER",
"event": "span_end",
"duration_ms": 123,
"status": "OK",
"attributes": {
"order_id": "ord-789",
"payment_id": "pay-456",
"gateway_response": "success"
}
}
Error Span Logging
{
"timestamp": "2026-02-26T18:00:00Z",
"level": "ERROR",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"span_name": "process_payment",
"span_kind": "SERVER",
"event": "span_end",
"duration_ms": 5123,
"status": "ERROR",
"error_code": "PAYMENT_GATEWAY_TIMEOUT",
"error_message": "Payment gateway timeout after 5000ms",
"stack_trace": "...",
"attributes": {
"order_id": "ord-789",
"retry_count": 3,
"gateway": "stripe"
}
}
Nested Span Logging
{
"timestamp": "2026-02-26T18:00:00Z",
"level": "INFO",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"parent_span_id": "c8be7c825a934b7d",
"span_name": "charge_card",
"span_kind": "INTERNAL",
"event": "span_start",
"duration_ms": 0,
"attributes": {
"order_id": "ord-789",
"card_last4": "4242"
}
}
OpenTelemetry Integration
Manual Instrumentation
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
tracer = trace.get_tracer(__name__)
def process_payment(order_id, amount):
with tracer.start_as_current_span("process_payment") as span:
span.set_attribute("order_id", order_id)
span.set_attribute("amount", amount)
try:
# Business logic
result = charge_credit_card(order_id, amount)
span.set_status(Status(StatusCode.OK))
span.set_attribute("payment_id", result.payment_id)
return result
except Exception as e:
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, str(e)))
raise
Automatic Instrumentation
Configuration for automatic instrumentation of common frameworks:
opentelemetry:
instrumentations:
- name: "opentelemetry-instrumentation-flask"
enabled: true
- name: "opentelemetry-instrumentation-sqlalchemy"
enabled: true
- name: "opentelemetry-instrumentation-requests"
enabled: true
sampling:
type: "parentbased_traceidratio"
ratio: 0.1 # Sample 10% of traces in production
exporters:
- type: "otlp"
endpoint: "http://otel-collector:4317"
- type: "logging" # Also log spans for local debugging
resource:
attributes:
service.name: "payment-service"
service.version: "1.2.3"
deployment.environment: "production"
Examples
# Generate trace context for new request
npm run tracing:generate-context -- --service payment-service --output context.json
# Propagate trace context through HTTP call
npm run tracing:propagate -- --trace-id abc123 --span-id def456 --target http://api.example.com
# Analyze trace from logs
npm run tracing:analyze -- --trace-id abc123 --sources "app.log,api.log,db.log" --output trace.json
# Set up OpenTelemetry instrumentation
npm run tracing:setup-otel -- --language nodejs --exporter jaeger --sampling-ratio 0.1
# Extract trace timeline from logs
npm run tracing:timeline -- --trace-id abc123 --output timeline.html
Output format
Trace Context Configuration:
tracing:
standard: "W3C TraceContext"
headers:
traceparent: "traceparent"
tracestate: "tracestate"
correlation_id: "X-Correlation-Id"
request_id: "X-Request-Id"
propagation:
http: true
messaging: true
database: true
rpc: true
sampling:
strategy: "probability"
rate: 0.1 # 10% sampling in production
decision_deferred: false
span_logging:
enabled: true
format: "json"
include_fields:
- trace_id
- span_id
- parent_span_id
- span_name
- span_kind
- event
- duration_ms
- status
events:
- span_start
- span_end
- span_event
- span_error
correlation:
business_ids:
- order_id
- user_id
- transaction_id
- session_id
Trace Analysis Report:
Distributed Trace Analysis
─────────────────────────
Trace ID: 0af7651916cd43dd8448eb211c80319c
Start Time: 2026-02-26T18:00:00Z
Duration: 1.234s
Status: ERROR (partial failure)
Services Involved:
1. api-gateway (entry point)
2. auth-service (authentication)
3. payment-service (payment processing)
4. notification-service (notifications)
5. database (persistence)
Span Timeline:
00.000ms - api-gateway: request_received (span_start)
00.123ms - api-gateway: auth_check (span_start)
00.234ms - auth-service: validate_token (span_start)
00.345ms - auth-service: validate_token (span_end) [OK]
00.456ms - api-gateway: auth_check (span_end) [OK]
00.567ms - payment-service: process_payment (span_start)
01.234ms - payment-service: charge_card (span_start)
05.678ms - payment-service: charge_card (span_end) [ERROR: timeout]
05.789ms - payment-service: process_payment (span_end) [ERROR]
05.890ms - api-gateway: request_completed (span_end) [ERROR]
Critical Path Analysis:
- Total duration: 1.234s
- Payment processing: 1.111s (90% of total time)
- Card charging: 4.444s (within payment processing)
- Card charging timeout at 5.000ms
Error Analysis:
- Root cause: Payment gateway timeout
- Impact: Payment failed, user notified
- Recovery: Automatic retry scheduled
- Alternative flows: None configured
Performance Insights:
- Slowest service: payment-service (1.111s)
- Fastest service: auth-service (0.111ms)
- Bottleneck: External payment gateway call
- Recommendation: Implement circuit breaker for payment gateway
Business Context:
- User ID: user-123
- Order ID: ord-789
- Amount: $99.99
- Payment method: credit_card
- Outcome: Failed (gateway timeout)
Notes
- Trace context should be propagated consistently across all service boundaries
- Sampling is essential in production to manage volume and cost
- Span logs should include business context for meaningful analysis
- Trace visualization requires complete context from all services
- Consider trace storage and retention policies for compliance
- Monitor trace collection and processing for reliability
- Implement trace-based alerting for performance degradation detection
- Test trace propagation in all communication patterns (sync, async, batch)
- Document trace standards for development teams
- Regularly review trace sampling rates based on volume and importance
Weekly Installs
16
Repository
wojons/skillsGitHub Stars
1
First Seen
14 days ago
Security Audits
Installed on
github-copilot16
codex16
kimi-cli16
gemini-cli16
cursor16
amp16