data-engineering
Data Engineering
Use this skill when the user is building or fixing a data platform, analytics stack, or warehouse-backed reporting workflow.
What this skill covers
- Reasoning through the full data engineering lifecycle (generation, ingestion, storage, transformation, serving) and the six undercurrents (security, data management, DataOps, data architecture, orchestration, software engineering)
- Calibrating architecture complexity to the organization's data maturity stage
- Designing dbt-style staging/intermediate/mart layers with explicit grain and update patterns
- Picking data models for analytics workloads (Kimball, Inmon, Data Vault, wide tables) with concrete trade-offs
- Defining metrics before building dashboards or features, using a four-tier hierarchy and six-step decision framework
- Choosing serving patterns: BI, embedded analytics, operational analytics, reverse ETL, ML feature serving
- Balancing centralized execution with domain ownership ("data mesh lite")
Boundaries
- Use
jimmy-skills@data-pipeline-reliabilitywhen the main problem is retries, duplicates, backfills, replay behavior, overwrite vs merge, or ordered delivery. - Use
jimmy-skills@data-architecture-strategywhen the main problem is choosing between RDW, MDW, lakehouse, data fabric, or data mesh. - Use
jimmy-skills@data-qualitywhen the main problem is publish gates, contracts, reconciliation, or schema drift. - Use
jimmy-skills@data-observabilitywhen the main problem is freshness, lag, skew, stale dashboards, or SLA alerting. - Use
jimmy-skills@data-stack-deliverywhen the main question is how Airflow, Snowflake, dbt, Spark, Kafka, and delivery automation fit together in practice.
Working approach
- Start from the decision the user needs to enable, not from the tool.
- Identify the lifecycle stage involved: generation, ingestion, storage, transformation, or serving.
- Prefer boring architecture:
- Batch before streaming unless latency requirements are explicit
- ELT before bespoke ETL when a warehouse can handle transforms
- Centralized warehouse + domain-reviewed definitions before pure data mesh
- Make grain, ownership, and freshness explicit before writing pipelines.
- Treat every mart, metric, or dashboard as a data product with an owner, SLA, and quality checks.
Default procedure
- Define the business question or decision first.
- Write down the grain, freshness target, and owner.
- Choose the simplest ingestion mode that satisfies the latency requirement.
- Keep source-shaped data in
raw, reusable business entities inintermediate, and team-facing outputs inmarts. - Define shared metrics once before building dashboards, alerts, or feature logic.
- Validate that the serving layer matches the consumer:
- BI for reviews and planning
- embedded analytics for product experiences
- operational analytics for fast response
- reverse ETL for action in external tools
Default architecture bias
rawcaptures source-shaped data with minimal logicintermediateexpresses reusable business entities and joinsmartsserve a specific team or decision- Shared dimensions and semantic metrics are defined once and reused
Heuristics
- If the question is "should this be real-time?", challenge it. Most analytics use cases should start with hourly or daily batch.
- If analytics queries are hitting an OLTP database, move them to an OLAP store.
- If metrics differ across teams, add a metrics/semantic layer before adding more dashboards.
- If the organization is small or mid-sized, borrow domain ownership ideas without adopting a full data mesh.
- If the user is in edtech or B2C learning, prioritize completion, retention, engagement, and content-quality feedback loops.
Gotchas
- A successful pipeline run is not the same as a trustworthy dataset. Quality and freshness checks still need to pass.
- Do not design marts before the row grain is explicit. Most reporting errors start there.
- Do not let each dashboard redefine core metrics such as
active_user,completion_rate, orchurn_rate. - For small and mid-sized teams, "data mesh" usually means domain review of definitions plus centralized platform execution, not decentralized infrastructure.
References
- Read references/platform-patterns.md when the task requires lifecycle framing, modeling trade-offs, serving choices, or multi-team platform strategy.
More from jimnguyendev/jimmy-skills
backend-go-testing
Provides a comprehensive guide for writing production-ready Golang tests. Covers table-driven tests, test suites with testify, mocks, unit tests, integration tests, benchmarks, code coverage, parallel tests, fuzzing, fixtures, goroutine leak detection with goleak, snapshot testing, memory leaks, CI with GitHub Actions, and idiomatic naming conventions. Use this whenever writing tests, asking about testing patterns or setting up CI for Go projects. Essential for ANY test-related conversation in Go.
14backend-go-code-style
Golang code style and readability conventions that require human judgment. Use when reviewing clarity, naming noise, file organization, package boundaries, comments, or maintainability tradeoffs in Go code. Do not use this for golangci-lint setup or lint output interpretation; use `jimmy-skills@backend-go-linter` for tooling.
12backend-go-safety
Defensive Golang coding to prevent panics, silent data corruption, and subtle runtime bugs. Use whenever writing or reviewing Go code that involves nil-prone types (pointers, interfaces, maps, slices, channels), numeric conversions, resource lifecycle (defer in loops), or defensive copying. Also triggers on questions about nil panics, append aliasing, map concurrent access, float comparison, or zero-value design.
11engineering-rest-api-design
REST API design conventions covering URL structure, HTTP methods, pagination, async patterns, idempotency, error envelopes, and API documentation standards. Use when designing new endpoints, reviewing API contracts, or establishing API guidelines before implementation in any language.
11backend-go-design-patterns
Idiomatic Golang design patterns for real backend code: constructors, error flow, dependency injection, resource lifecycle, resilience, data handling, and package boundaries. Apply when designing Go APIs, structuring packages, choosing between patterns, making architecture decisions, or hardening production behavior. Default to simple, feature-first designs unless complexity has clearly appeared.
11backend-go-grpc
Provides gRPC usage guidelines, protobuf organization, and production-ready patterns for Golang microservices. Use when implementing, reviewing, or debugging gRPC servers/clients, writing proto files, setting up interceptors, handling gRPC errors with status codes, configuring TLS/mTLS, testing with bufconn, or working with streaming RPCs.
11