distributed-tracing
Originally fromwshobson/agents
SKILL.md
Distributed Tracing
Implement distributed tracing with Jaeger and OpenTelemetry for request flow visibility.
Trace Structure
Trace (Request ID: abc123)
↓
Span (frontend) [100ms]
↓
Span (api-gateway) [80ms]
├→ Span (auth-service) [10ms]
└→ Span (user-service) [60ms]
└→ Span (database) [40ms]
Key Components
| Component | Description |
|---|---|
| Trace | End-to-end request journey |
| Span | Single operation within a trace |
| Context | Metadata propagated between services |
| Tags | Key-value pairs for filtering |
OpenTelemetry Setup (Python)
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
# Initialize
provider = TracerProvider()
processor = BatchSpanProcessor(JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Instrument Flask
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
# Custom spans
@app.route('/api/users')
def get_users():
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("get_users") as span:
span.set_attribute("user.count", 100)
return fetch_users()
OpenTelemetry Setup (Node.js)
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' })
));
provider.register();
Context Propagation
# Inject trace context into HTTP headers
from opentelemetry.propagate import inject
headers = {}
inject(headers) # Adds traceparent header
response = requests.get('http://downstream/api', headers=headers)
Sampling Strategies
# Probabilistic - sample 1%
sampler:
type: probabilistic
param: 0.01
# Rate limiting - max 100/sec
sampler:
type: ratelimiting
param: 100
Jaeger Queries
# Find slow requests
service=my-service duration > 1s
# Find errors
service=my-service error=true tags.http.status_code >= 500
Correlated Logging
def process_request():
span = trace.get_current_span()
trace_id = span.get_span_context().trace_id
logger.info("Processing", extra={"trace_id": format(trace_id, '032x')})
Best Practices
- Sample appropriately (1-10% in production)
- Add meaningful tags (user_id, request_id)
- Propagate context across all boundaries
- Log exceptions in spans
- Use consistent naming for operations
- Monitor tracing overhead (<1% CPU impact)
- Correlate with logs using trace IDs
Weekly Installs
30
Repository
eyadsibai/ltkFirst Seen
Jan 28, 2026
Security Audits
Installed on
gemini-cli25
opencode23
github-copilot22
codex22
claude-code21
antigravity19