addon-observability-telemetry
Add-on: Observability Telemetry
Use this skill when a project needs to be diagnosable in production with structured telemetry instead of ad hoc console logging. Prefer trusted, ecosystem-standard observability packages over custom logging utilities. Use this as the shared observability policy layer, then pair it with the runtime-specific implementation add-on when available.
Compatibility
- Works with all
architect-*skills. - Especially useful for
architect-python-uv-fastapi-sqlalchemy,architect-nextjs-bun-app, and any queue or worker style runtime. - Can be combined with
addon-deterministic-eval-suiteto make telemetry contracts testable. - Prefer
addon-observability-python-structlogfor Python runtimes. - Prefer
addon-observability-nextjs-pinofor Next.js or TypeScript server runtimes.
Inputs
Collect:
TELEMETRY_MODE:logs-only|logs+metrics|logs+metrics+traces.TRACE_EXPORTER:otlp|stdout|none.LOG_FORMAT:json|logfmt.ERROR_TRACKING:yes|no(defaultyes).PII_REDACTION:strict|standard.LOG_BACKEND: default to the runtime-standard package unless the user explicitly requests another option.
Integration Workflow
- Add telemetry artifacts:
docs/OBSERVABILITY.md
src/{{MODULE_NAME}}/observability/logging.*
src/{{MODULE_NAME}}/observability/metrics.*
src/{{MODULE_NAME}}/observability/tracing.*
src/{{MODULE_NAME}}/api/routes/health.*
scripts/ops/check_health.*
- Copy and adapt this skill's bundled health-check script:
scripts/check_health.sh- Place the adapted result in the target project at
scripts/ops/check_health.sh(or the runtime-equivalent script path). - Treat
observability/logging.*as package configuration and adapters only; do not implement a custom logger there.
- Standardize event shape:
- use structured logs with request or run correlation IDs
- classify levels consistently (
info,warn,error) - emit explicit error codes for expected failures
- Add runtime diagnostics:
- health and readiness checks for critical dependencies
- latency, error-rate, and throughput metrics
- trace spans around network, database, and LLM calls when tracing is enabled
- Protect sensitive data:
- redact auth headers, secrets, and raw personal data
- document what fields may be logged
- keep sampling and retention decisions explicit
- Select runtime-standard dependencies:
- Next.js or TypeScript server runtimes: prefer
addon-observability-nextjs-pino. If that add-on is unavailable, still usepino; do not rely on ad hocconsole.login production request paths.logging.tsshould configure serializers, redaction, correlation IDs, and child loggers only. - Python services and workers: prefer
addon-observability-python-structlog. If that add-on is unavailable, still usestructlogfor APIs and long-lived services.loguruis acceptable for smaller services or CLI-heavy workers when a simpler sink model is sufficient. Do not build a custom wrapper aroundprintor bare stdlib logging as the primary production path. - If metrics or traces are enabled, prefer official OpenTelemetry SDK or instrumentation packages instead of custom metric counters or tracing helpers.
Required Template
docs/OBSERVABILITY.md
# Observability Contract
## Telemetry Mode
- logs+metrics
## Required Fields
- timestamp
- level
- message
- correlation_id
## Health Checks
- app process
- database
- external provider dependencies
## Redaction Policy
- Never log secrets, auth headers, or raw PII.
Guardrails
-
Documentation contract for generated code:
- Python: write module docstrings and docstrings for public classes, methods, and functions.
- Next.js/TypeScript: write JSDoc for exported components, hooks, utilities, and route handlers.
- Add concise rationale comments only for non-obvious logic, invariants, or safety constraints.
- Apply this contract even when using template snippets below; expand templates as needed.
-
Do not emit telemetry that leaks credentials, tokens, or full request bodies by default.
-
Prefer one well-supported observability dependency over a dependency-free but hand-rolled logger.
-
Keep
observability/logging.*limited to configuring, adapting, and binding the chosen package; do not create a homemade logger implementation. -
Do not use raw
console.login production TypeScript or Next.js server code except in tests or local-only scripts. -
Do not use raw
print()in production Python service code except for explicit CLI UX or local-only scripts. -
If the runtime already has a clear ecosystem default, use it unless the user explicitly requests an alternative and the tradeoff is documented.
-
Use correlation IDs consistently across logs, metrics labels, and traces where supported.
-
Prefer stable event names over free-form log spam.
-
Missing health diagnostics for critical dependencies should be treated as incomplete.
-
Keep observability dependencies bounded; avoid heavyweight infra by default, but do not treat "bounded" as "dependency-free."
Validation Checklist
- Confirm generated code includes required docstrings/JSDoc and rationale comments for non-obvious logic.
test -f docs/OBSERVABILITY.md
rg -n "health|ready|trace|metric|correlation" src || true
rg -n "console\\.log" src || true
test -e scripts/ops/check_health.sh || test -e scripts/ops/check_health.ts || test -e scripts/ops/check_health.py || true
Manual checks:
- Health endpoint reports degraded status for dependency failures.
- Logs and traces include correlation identifiers without leaking secrets.
observability/logging.*configures a real package (pino,structlog,loguru, or explicit user-approved equivalent) instead of implementing logging from scratch.
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.