data-stack-delivery

Installation

SKILL.md

Data Stack Delivery

Use this skill when the question is not only "what architecture should we choose?" but also "how do these common data-stack tools fit together in practice?"

Official docs for Airflow, Snowflake, dbt, Spark, Kafka, and Deequ remain authoritative. Use this skill for pragmatic wiring, examples, and trade-offs.

What this skill covers

Airflow setup patterns for local learning and production-like orchestration
Snowflake basics for warehouses, stages, file loading, and quick validation
dbt project shape, build flow, tests, and team-facing model organization
Spark batch-processing patterns and the default optimization checklist
Kafka basics for topics, partitions, late data, and stream-processing choices
Data quality checkpoints across Python, SQL, dbt, and Deequ
Automation principles such as container-first delivery, slim CI/CD, and idempotent reruns

Boundaries

Use jimmy-skills@data-engineering when the main decision is platform shape, semantic metrics, marts, or multi-team ownership.
Use jimmy-skills@data-pipeline-reliability when retries, replay, deduplication, or backfill safety are the primary risk.
Use jimmy-skills@data-quality when the main task is designing validation rules, contracts, reconciliation, or publish gates.
Use jimmy-skills@data-observability when the main task is freshness, lag, stale dashboards, or SLA alerting after delivery.

Working approach

Separate learning-stack guidance from production-like guidance before suggesting tools.
Pick the simplest end-to-end flow that proves the data path works.
Make grain, idempotency, quality checks, and ownership explicit before adding more tooling.
Verify with SQL, tests, or consumer-facing outputs instead of trusting a green orchestrator alone.

Default procedure

Identify which stage the user is working on:
- orchestration
- warehouse loading
- transformation
- batch processing
- stream processing
- quality gate
- automation
Choose the narrowest toolchain that fits the requirement.
Define the input, output, and success check for that stage.
Add one explicit quality or correctness check before calling the stage done.
Document the handoff to the next stage in the stack.

Defaults

Prefer a lightweight local Airflow setup for learning, then move to the official or managed stack for production-like behavior.
Start Snowflake with small warehouses, staged files, and explicit load-validation steps.
Use dbt build and dbt test as the default transformation and publish workflow.
Start with Spark batch before Kafka streaming unless consumer latency truly requires real time.
Treat automation as delivery safety, CI/CD, and rerun discipline, not just cron scheduling.

Gotchas

A successful Airflow run does not prove the warehouse table is correct.
inferSchema is convenient for demos, but explicit schemas are safer for production batch jobs.
A dbt model that builds without tests is not a trustworthy publish gate.
The public Week 8 automation material is directional, not a complete implementation guide.

Reference

Read references/tooling-playbook.md when the task needs practical Airflow, Snowflake, dbt, Spark, Kafka, quality, and automation examples in one path.
Read references/sources/README.md when you need the in-repo Datacamping source notes instead of external pages.

Related skills

More from jimnguyendev/jimmy-skills

Installs

Repository

jimnguyendev/ji…y-skills

GitHub Stars

First Seen

Apr 23, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

data-stack-delivery

Data Stack Delivery

What this skill covers

Boundaries

Working approach

Default procedure

Defaults

Gotchas

Reference

More from jimnguyendev/jimmy-skills

backend-go-testing

backend-go-code-style

backend-go-safety

engineering-rest-api-design

backend-go-design-patterns

backend-go-grpc