observability
Observability
Comparable to: Datadog, Grafana, Honeycomb, Jaeger
Key Concepts
Use the concepts below when they fit the task. Not every worker needs custom spans or metrics.
- Built-in OpenTelemetry support across all SDKs — every function invocation is automatically traced
- The engine exports traces, metrics, and logs via OTLP to any compatible collector
- Workers propagate W3C trace context automatically across function invocations
- Prometheus metrics are exposed on port 9464
- SDK init with
otelconfig enables telemetry per worker - Custom spans via
withSpan(name, opts, fn)wrap async work with trace context - Custom metrics via
getMeter()create counters and histograms
Architecture
The worker SDK generates spans, metrics, and logs during function execution. These flow to the engine, which exports them via OTLP to a collector (Jaeger, Grafana, Datadog). The engine also exposes a Prometheus endpoint on port 9464 for scraping.
iii Primitives Used
| Primitive | Purpose |
|---|---|
init(url, { otel }) |
Connect worker with telemetry config |
withSpan(name, opts, fn) |
Create a custom trace span |
getTracer() |
Access OpenTelemetry Tracer directly |
getMeter() |
Access OpenTelemetry Meter for custom metrics |
currentTraceId() |
Get active trace ID for correlation |
injectTraceparent() |
Inject W3C trace context into outbound calls |
onLog(callback, { level }) |
Subscribe to log events |
shutdown_otel() |
Graceful shutdown of telemetry pipeline |
Reference Implementation
See ../references/observability.js for the full working example — a worker with custom spans, metrics counters, trace propagation, and log subscriptions connected to an OTel collector.
Common Patterns
Code using this pattern commonly includes, when relevant:
init('ws://localhost:49134', { otel: { enabled: true, serviceName: 'my-svc' } })— enable telemetrywithSpan('validate-order', {}, async (span) => { span.setAttribute('order.id', id); ... })— custom spangetMeter().createCounter('orders.processed')— custom counter metricgetMeter().createHistogram('request.duration')— custom histogram metriconLog((log) => { ... }, { level: 'warn' })— subscribe to warnings and abovecurrentTraceId()— get active trace ID for correlation with external systemsinjectTraceparent()— propagate trace context to outbound HTTP calls- Disable telemetry:
init(url, { otel: { enabled: false } })orOTEL_ENABLED=false
Adapting This Pattern
Use the adaptations below when they apply to the task.
- Enable
otelin init config to start collecting traces automatically - Add custom spans around expensive operations (DB queries, LLM calls, external APIs)
- Create domain-specific metrics (orders processed, payment failures, queue depth)
- Use
currentTraceId()to correlate iii traces with external system logs - Configure
OtelModulein iii-config.yaml for engine-side exporter, sampling ratio, and alerts - Point the OTLP endpoint at your collector (Jaeger, Grafana Tempo, Datadog Agent)
Engine Configuration
OtelModule must be enabled in iii-config.yaml for engine-side traces, metrics, and logs. See ../references/iii-config.yaml for the full annotated config reference.
Pattern Boundaries
- For engine-side OtelModule YAML configuration, prefer
engine-config. - For SDK init options and function registration, prefer
functions-and-triggers. - Stay with
observabilitywhen the primary problem is SDK-level telemetry: spans, metrics, logs, and trace propagation.
When to Use
- Use this skill when the task is primarily about
observabilityin the iii engine. - Triggers when the request directly asks for this pattern or an equivalent implementation.
Boundaries
- Never use this skill as a generic fallback for unrelated tasks.
- You must not apply this skill when a more specific iii skill is a better fit.
- Always verify environment and safety constraints before applying examples from this skill.