data-stack-delivery
Data Stack Delivery
Use this skill when the question is not only "what architecture should we choose?" but also "how do these common data-stack tools fit together in practice?"
Official docs for Airflow, Snowflake, dbt, Spark, Kafka, and Deequ remain authoritative. Use this skill for pragmatic wiring, examples, and trade-offs.
What this skill covers
- Airflow setup patterns for local learning and production-like orchestration
- Snowflake basics for warehouses, stages, file loading, and quick validation
- dbt project shape, build flow, tests, and team-facing model organization
- Spark batch-processing patterns and the default optimization checklist
- Kafka basics for topics, partitions, late data, and stream-processing choices
- Data quality checkpoints across Python, SQL, dbt, and Deequ
- Automation principles such as container-first delivery, slim CI/CD, and idempotent reruns
Boundaries
- Use
jimmy-skills@data-engineeringwhen the main decision is platform shape, semantic metrics, marts, or multi-team ownership. - Use
jimmy-skills@data-pipeline-reliabilitywhen retries, replay, deduplication, or backfill safety are the primary risk. - Use
jimmy-skills@data-qualitywhen the main task is designing validation rules, contracts, reconciliation, or publish gates. - Use
jimmy-skills@data-observabilitywhen the main task is freshness, lag, stale dashboards, or SLA alerting after delivery.
Working approach
- Separate learning-stack guidance from production-like guidance before suggesting tools.
- Pick the simplest end-to-end flow that proves the data path works.
- Make grain, idempotency, quality checks, and ownership explicit before adding more tooling.
- Verify with SQL, tests, or consumer-facing outputs instead of trusting a green orchestrator alone.
Default procedure
- Identify which stage the user is working on:
- orchestration
- warehouse loading
- transformation
- batch processing
- stream processing
- quality gate
- automation
- Choose the narrowest toolchain that fits the requirement.
- Define the input, output, and success check for that stage.
- Add one explicit quality or correctness check before calling the stage done.
- Document the handoff to the next stage in the stack.
Defaults
- Prefer a lightweight local Airflow setup for learning, then move to the official or managed stack for production-like behavior.
- Start Snowflake with small warehouses, staged files, and explicit load-validation steps.
- Use
dbt buildanddbt testas the default transformation and publish workflow. - Start with Spark batch before Kafka streaming unless consumer latency truly requires real time.
- Treat automation as delivery safety, CI/CD, and rerun discipline, not just cron scheduling.
Gotchas
- A successful Airflow run does not prove the warehouse table is correct.
inferSchemais convenient for demos, but explicit schemas are safer for production batch jobs.- A dbt model that builds without tests is not a trustworthy publish gate.
- The public Week 8 automation material is directional, not a complete implementation guide.
Reference
- Read references/tooling-playbook.md when the task needs practical Airflow, Snowflake, dbt, Spark, Kafka, quality, and automation examples in one path.
- Read references/sources/README.md when you need the in-repo Datacamping source notes instead of external pages.
More from jimnguyendev/jimmy-skills
backend-go-testing
Provides a comprehensive guide for writing production-ready Golang tests. Covers table-driven tests, test suites with testify, mocks, unit tests, integration tests, benchmarks, code coverage, parallel tests, fuzzing, fixtures, goroutine leak detection with goleak, snapshot testing, memory leaks, CI with GitHub Actions, and idiomatic naming conventions. Use this whenever writing tests, asking about testing patterns or setting up CI for Go projects. Essential for ANY test-related conversation in Go.
14backend-go-code-style
Golang code style and readability conventions that require human judgment. Use when reviewing clarity, naming noise, file organization, package boundaries, comments, or maintainability tradeoffs in Go code. Do not use this for golangci-lint setup or lint output interpretation; use `jimmy-skills@backend-go-linter` for tooling.
12backend-go-safety
Defensive Golang coding to prevent panics, silent data corruption, and subtle runtime bugs. Use whenever writing or reviewing Go code that involves nil-prone types (pointers, interfaces, maps, slices, channels), numeric conversions, resource lifecycle (defer in loops), or defensive copying. Also triggers on questions about nil panics, append aliasing, map concurrent access, float comparison, or zero-value design.
11engineering-rest-api-design
REST API design conventions covering URL structure, HTTP methods, pagination, async patterns, idempotency, error envelopes, and API documentation standards. Use when designing new endpoints, reviewing API contracts, or establishing API guidelines before implementation in any language.
11backend-go-design-patterns
Idiomatic Golang design patterns for real backend code: constructors, error flow, dependency injection, resource lifecycle, resilience, data handling, and package boundaries. Apply when designing Go APIs, structuring packages, choosing between patterns, making architecture decisions, or hardening production behavior. Default to simple, feature-first designs unless complexity has clearly appeared.
11backend-go-grpc
Provides gRPC usage guidelines, protobuf organization, and production-ready patterns for Golang microservices. Use when implementing, reviewing, or debugging gRPC servers/clients, writing proto files, setting up interceptors, handling gRPC errors with status codes, configuring TLS/mTLS, testing with bufconn, or working with streaming RPCs.
11