data-observability
Data Observability
Use this skill when the problem is not "is the code deployed?" but "is the data still arriving, complete, and on time?"
Goal
Detect broken or degraded data delivery before downstream users notice.
Working approach
- Identify the unit of delay or completeness:
- batch run
- partition
- row count
- stream offset
- commit version
- Define the consumer-facing SLA first.
- Track both freshness and volume, not just task success.
- Alert on states that require action, not on every anomaly.
Default procedure
- Identify the dataset, its consumer, and the promised update time.
- Define a formal SLA for each critical dataset: cadence, deadline, owner, escalation tiers.
- Define one freshness signal and one volume or completeness signal.
- Compare anomalies against the last healthy baseline, not just the last run.
- Account for false positives: add seasonal baselines, maintenance-window suppression, and alert deduplication before enabling pages.
- Add an owner and escalation path for every critical SLA.
- Expose
last_updatedwherever users consume the dataset.
Minimum observability set
- Last successful update timestamp
- Freshness SLA per critical dataset
- Volume/skew comparison to prior healthy runs
- Lag indicator for streaming or incremental consumers
- Basic lineage for high-value marts and dashboards
Gotchas
- A green orchestrator job can still produce stale or incomplete data.
- Average lag often hides the partition or consumer that is actually failing.
- Comparing skew to the immediately previous run can create alert loops after a bad run.
- Users need dataset freshness in the interface they already use, not only in an ops dashboard.
Reference
- Read references/observability-patterns.md when the task involves freshness signals, lag units, skew thresholds, SLA misses, or lineage scope.
More from jimnguyendev/jimmy-skills
backend-go-testing
Provides a comprehensive guide for writing production-ready Golang tests. Covers table-driven tests, test suites with testify, mocks, unit tests, integration tests, benchmarks, code coverage, parallel tests, fuzzing, fixtures, goroutine leak detection with goleak, snapshot testing, memory leaks, CI with GitHub Actions, and idiomatic naming conventions. Use this whenever writing tests, asking about testing patterns or setting up CI for Go projects. Essential for ANY test-related conversation in Go.
14backend-go-code-style
Golang code style and readability conventions that require human judgment. Use when reviewing clarity, naming noise, file organization, package boundaries, comments, or maintainability tradeoffs in Go code. Do not use this for golangci-lint setup or lint output interpretation; use `jimmy-skills@backend-go-linter` for tooling.
12backend-go-safety
Defensive Golang coding to prevent panics, silent data corruption, and subtle runtime bugs. Use whenever writing or reviewing Go code that involves nil-prone types (pointers, interfaces, maps, slices, channels), numeric conversions, resource lifecycle (defer in loops), or defensive copying. Also triggers on questions about nil panics, append aliasing, map concurrent access, float comparison, or zero-value design.
11engineering-rest-api-design
REST API design conventions covering URL structure, HTTP methods, pagination, async patterns, idempotency, error envelopes, and API documentation standards. Use when designing new endpoints, reviewing API contracts, or establishing API guidelines before implementation in any language.
11backend-go-design-patterns
Idiomatic Golang design patterns for real backend code: constructors, error flow, dependency injection, resource lifecycle, resilience, data handling, and package boundaries. Apply when designing Go APIs, structuring packages, choosing between patterns, making architecture decisions, or hardening production behavior. Default to simple, feature-first designs unless complexity has clearly appeared.
11backend-go-grpc
Provides gRPC usage guidelines, protobuf organization, and production-ready patterns for Golang microservices. Use when implementing, reviewing, or debugging gRPC servers/clients, writing proto files, setting up interceptors, handling gRPC errors with status codes, configuring TLS/mTLS, testing with bufconn, or working with streaming RPCs.
11