observability
Observability
If a boundary matters to delivery, debugging, cost, or safety, instrument it deliberately and name the signal contract explicitly.
Context
This skill covers cross-cutting instrumentation design, not just runtime dashboards. Use it when the system needs durable telemetry contracts for:
- application events and structured logs
- metrics and traces at important boundaries
- AI execution telemetry such as skill invocation, runner execution, model usage, and token accounting
- workflow-level observability that must survive handoff across phases
This skill is intentionally distinct from monitoring-observability:
observabilitydefines what signals should exist and how they are structuredmonitoring-observabilityturns important production signals into dashboards, alerts, and responder workflows
The repository's current execution contract uses append-only execution-observability.jsonl artifacts plus periodic summaries from scripts/summarize_execution_observability.py. That keeps the runtime loop concrete without committing to a heavier backend too early.
Inputs
- Current code or workflow entry points where behavior, failure, or cost must be visible
- Existing execution boundaries such as CLI runners, background jobs, request handlers, or workflow dispatchers
- Any external platform constraints on usage accounting or token reporting
Process
Step 1: Map the Boundary That Must Become Visible
Start from the operational or product question, not from logging APIs.
Examples:
- which skill was invoked, by what route, and what happened next
- which model was used, with what token input and output
- which runner or wrapper failed, timed out, or retried
- which execution chain consumed the most cost or latency
If the question cannot change debugging, product decisions, or operational behavior, it is probably noise.
Step 2: Define a Small, Versioned Event Schema
Write a stable schema before instrumenting code. At minimum, define:
- event types
- required fields shared by every event
- field names for model and token accounting
- null or unavailable rules for data the runner cannot provide
- versioning rules for future extensions
For AI execution telemetry, prefer canonical field names:
model_nametoken_inputtoken_outputtoken_total
Do not invent token values. If the runner cannot provide usage, record null and state the source limitation explicitly.
Step 3: Instrument at Shared Entry Points
Prefer wrappers, decorators, middleware, or context managers around execution boundaries instead of scattering ad hoc logging across many call sites.
Good candidates:
- model runner adapters
- skill dispatch or invocation boundaries
- benchmark or eval execution wrappers
- workflow orchestration seams
The goal is to capture execution once per boundary with a consistent schema.
For skill systems, capture two different signals:
- real model usage from the provider or runner when exposed
- exact byte/character measurements for what the skill loaded or deferred
Do not mix the two. Exact provider or runner usage answers token and billing questions. Skill-context byte/character counts answer context-size questions until a model-specific tokenizer or provider token-count API is available.
Step 4: Separate Signal Capture from Signal Consumption
Keep the instrumentation contract independent from dashboards and alerts.
- this skill defines and emits structured events
- downstream operational skills consume those events for triage and alerting
That separation keeps instrumentation stable even when dashboard tools or operational needs change.
Step 5: Validate with Real Failure and Cost Questions
Before calling the design complete, verify that a reviewer can answer:
- which skill ran
- which model and runner were used
- how many tokens were consumed
- how many skill-context bytes and characters were loaded or deferred
- where the time went
- where the chain failed or stopped
If those questions still require ad hoc grep or guesswork, the instrumentation boundary is incomplete.
Step 6: Close the Runtime Feedback Loop
Do not stop at event emission. Summarize recurring failures, missing usage data, and high-risk actions on a regular cadence, then feed the findings back into:
- skill gotchas
- benchmark plans
- routing rules
- approval points for risky actions
Outputs
- observability-spec -- The written boundary definition: what is instrumented, why it matters, and who consumes it
- execution-event-schema -- Versioned event definitions for execution telemetry, including skill invocation and model usage fields
Quality Gate
- Important execution boundaries are explicitly identified before instrumentation begins
- Event schema is versioned and uses stable field names
- Skill invocation, runner execution, and model usage are distinguishable event types
- Model and token accounting fields use canonical names and never fabricate unavailable values
- Estimated runner usage is kept separate from exact provider or runner usage
- Skill-context measurements use exact byte/character counts unless a model-specific tokenizer or provider token-count API is available
- Runtime summaries can compare baseline and with-skill exact token usage before any token-saving claim is accepted
- Ownership and downstream consumption path are documented
- Runtime summaries can identify recurring failures, missing usage, and risky actions
Anti-Patterns
- Metric soup -- Capturing everything because it is easy, without clear questions the telemetry answers.
- Inline logging everywhere -- Repeating custom logging at each call site instead of instrumenting shared boundaries.
- Conflating instrumentation with dashboards -- Hard-coding alerting or UI assumptions into the event model.
- Invented usage data -- Estimating token counts and then aggregating them with exact provider or runner usage.
- No schema versioning -- Breaking downstream consumers every time fields evolve.
- Optimization before measurement -- Shortening or compressing skill instructions before proving the change preserves benchmark quality and actually reduces loaded context.
Related Skills
- monitoring-observability -- turns important runtime signals into dashboards, alerts, and responder workflows
- documentation -- records ADRs, schemas, and operational guidance for the observability layer
- ci-cd -- supplies release and rollout boundaries that should remain visible in telemetry
docs/observability/runtime-feedback-loop.md-- explains how execution JSONL evidence feeds back into the skills system
Distribution
- Public install surface:
skills/.curated - Canonical authoring source:
skills/cross-cutting/observability/SKILL.md - This package is exported for
npx skills add/updatecompatibility. - Packaging stability:
beta - Capability readiness:
beta
More from yknothing/prodcraft
system-design
Use when reviewed requirements or specifications are ready and the team must decide high-level architecture, component boundaries, integration seams, or brownfield coexistence strategy before API design, technology selection, or task planning.
6ci-cd
Use when a reviewed implementation slice needs an automated build, test, and deployment pipeline, especially when brownfield rollback, release-boundary checks, contract/integration gates, and staged delivery must be explicit before shipping.
6intake
The mandatory gateway for all new engineering work. Triage and route new products, apps, features, migrations, tech-debt, or any 'not sure where to start' request to the correct lifecycle path. Use before starting design or implementation. Do not use for ongoing tasks, specific debugging, or PR reviews.
6feature-development
Use when a reviewed task slice has tests or acceptance targets and the team must turn it into a small, mergeable implementation increment without expanding scope, breaking contracts, or hiding release-boundary risk.
6monitoring-observability
Use when a live service or newly delivered release needs actionable telemetry, dashboards, and alerts that expose real user-impactful boundaries, especially when brownfield coexistence rules, unsupported-flow safety, rollback health, or queue/backfill behavior must be visible before incidents escalate.
6incident-response
Use when a live production issue needs coordinated containment, severity triage, stakeholder communication, and evidence capture, especially when a recent release, brownfield coexistence rules, rollback decisions, or unresolved contract boundaries must be handled before root-cause work.
6