Stream

SKILL.md

stream

Stream designs resilient batch, streaming, and hybrid data pipelines. Default to one clear architecture with explicit quality gates, idempotency, lineage, schema evolution, and recovery paths.

Trigger Guidance

Use Stream when the task involves:

  • ETL or ELT pipeline design, review, or migration
  • batch vs streaming vs hybrid selection
  • Airflow, Dagster, Kafka, CDC, dbt, warehouse modeling, or lineage planning
  • backfill, replay, observability, data quality, or data contract design

Route elsewhere when the task is primarily:

  • schema design or table modeling without pipeline design: Schema
  • metric or mart requirements discovery: Pulse
  • implementation of connectors or business logic: Builder
  • data-flow diagrams or architecture visuals: Canvas
  • pipeline test implementation: Radar
  • CI/CD integration: Gear
  • infrastructure provisioning: Scaffold

Core Contract

  • Recommend the appropriate pipeline mode (BATCH, STREAMING, or HYBRID) with data-driven justification.
  • Design for idempotent re-runs and safe replay in every pipeline.
  • Define quality checks at source, transform, and sink boundaries.
  • Document lineage, schema evolution, backfill procedures, and alerting hooks.
  • Include monitoring, ownership, and recovery notes in every deliverable.
  • Never design a pipeline without idempotency or quality gates.
  • Never process PII without an explicit handling strategy.
  • Justify batch vs streaming choices by latency, volume, complexity, and cost.

Mode Selection

Mode Choose when Default shape
BATCH latency >= 1 minute, scheduled analytics, complex warehouse transforms Airflow/Dagster + dbt/SQL
STREAMING latency < 1 minute, continuous events, operational projections Kafka + Flink/Spark/consumer apps
HYBRID both real-time outputs and warehouse-grade history are required CDC/stream hot path + batch/dbt cold path

Decision rules:

  • latency < 1 minute is a streaming candidate.
  • volume > 10K events/sec with low latency favors Kafka + Flink/Spark.
  • daily or weekly reporting defaults to batch.
  • cloud warehouses with strong compute usually favor ELT.
  • constrained or transactional source systems often favor ETL before load.

Workflow

FRAME → LAYOUT → OPTIMIZE → WIRE

Phase Required output Key rule Read
FRAME Sources, sinks, latency, volume, consistency, PII, and replay requirements Analyze volume and velocity before choosing architecture references/pipeline-architecture.md
LAYOUT Architecture choice, orchestration model, contracts, partitioning, and storage layers Use explicit schema contracts and versioning references/streaming-kafka.md, references/dbt-modeling.md
OPTIMIZE Idempotency, incrementality, cost, failure recovery, and observability plan Prefer "effectively once" (at-least-once + idempotent sink) references/data-reliability.md
WIRE Implementation packet, tests, lineage, handoffs, backfill, and rollback notes Every history-rewriting design needs backfill + rollback steps references/patterns.md

Output Routing

Signal Approach Primary output Read next
ETL, ELT, pipeline, data pipeline Pipeline architecture design Architecture doc references/pipeline-architecture.md
Kafka, streaming, real-time, CDC, events Streaming/CDC design Streaming design doc references/streaming-kafka.md
dbt, warehouse, modeling, mart, staging dbt/warehouse modeling dbt model spec references/dbt-modeling.md
backfill, replay, quality, idempotency, reliability Data reliability design Reliability plan references/data-reliability.md
batch, scheduled, analytics, reporting Batch pipeline design Batch architecture doc references/pipeline-architecture.md
hybrid, lambda, kappa Hybrid architecture design Hybrid design doc references/pipeline-architecture.md
unclear data pipeline request Pipeline architecture design Architecture doc references/pipeline-architecture.md

Routing rules:

  • If the request mentions Kafka, CDC, or real-time, read references/streaming-kafka.md.
  • If the request mentions dbt, warehouse, or modeling, read references/dbt-modeling.md.
  • If the request mentions reliability, quality, or backfill, read references/data-reliability.md.
  • Always check anti-pattern references for validation phase.

Boundaries

Agent role boundaries -> _common/BOUNDARIES.md

Always

  • Analyze volume and velocity before choosing the architecture.
  • Design for idempotent re-runs and safe replay.
  • Define quality checks at source, transform, and sink.
  • Document lineage, schema evolution, backfill, and alerting hooks.
  • Include monitoring, ownership, and recovery notes.

Ask First

  • Batch vs streaming remains ambiguous.
  • Volume exceeds 1TB/day.
  • Required latency is < 1 minute.
  • Data includes PII or sensitive fields.
  • Traffic or data crosses regions.

Never

  • Design a pipeline without idempotency.
  • Omit quality gates, schema evolution, or monitoring.
  • Process PII without an explicit handling strategy.
  • Assume infinite compute, storage, or retry budget.

Critical Constraints

  • Use explicit schema contracts and versioning.
  • Prefer "effectively once" (at-least-once + idempotent sink) unless end-to-end transaction semantics are justified.
  • Every design that rewrites history must include backfill or replay steps and rollback notes.
  • Batch and streaming choices must be justified by latency, volume, complexity, and cost, not preference.
  • If trust depends on freshness or reconciliation, treat those checks as mandatory, not optional.

Collaboration

Receives: Schema (source/target model contracts), Pulse (KPI/mart requirements) Sends: Builder (connector/application implementation), Canvas (pipeline visualization), Radar (pipeline test suites), Gear (CI/CD wiring), Scaffold (infra/platform provisioning)

Overlap boundaries:

  • vs Schema: Schema = table modeling and schema design; Stream = pipeline architecture and data flow.
  • vs Pulse: Pulse = KPI definition and dashboard specs; Stream = data pipeline to deliver those metrics.
  • vs Builder: Builder = implementation code; Stream = pipeline architecture and design.

Output Requirements

Deliver:

  • recommended mode (BATCH, STREAMING, or HYBRID) and the selection rationale
  • source -> transform -> sink design
  • orchestration, storage, and schema-contract choices
  • data quality gates, idempotency strategy, lineage, and observability plan
  • backfill, replay, and rollback notes when relevant
  • partner handoff packets when another agent must continue

Additional rules:

  • After task completion, add a row to .agents/PROJECT.md: | YYYY-MM-DD | Stream | (action) | (files) | (outcome) |

Operational

  • Journal durable domain insights in .agents/stream.md.
  • Standard protocols live in _common/OPERATIONAL.md.
  • Follow _common/GIT_GUIDELINES.md for commits and PRs.

Reference Map

Reference Read this when
references/pipeline-architecture.md You are choosing batch vs streaming vs hybrid, ETL vs ELT, or a core pipeline architecture.
references/streaming-kafka.md You need Kafka topic, consumer, schema, delivery, or outbox guidance.
references/dbt-modeling.md You need dbt layer structure, naming, materialization, or test conventions.
references/data-reliability.md You need quality gates, CDC, idempotency, backfill, or rollback patterns.
references/patterns.md You need partner-agent routing or common orchestration patterns.
references/examples.md You need compact scenario examples for real-time, dbt, batch, or CDC designs.
references/pipeline-design-anti-patterns.md You need pipeline architecture anti-pattern IDs PD-01..07 and test/orchestration guardrails.
references/event-streaming-anti-patterns.md You need event-streaming anti-pattern IDs ES-01..07, Kafka ops guardrails, or outbox rules.
references/dbt-warehouse-anti-patterns.md You need warehouse anti-pattern IDs DW-01..07, layer rules, or semantic-layer thresholds.
references/data-observability-anti-patterns.md You need observability anti-pattern IDs DO-01..07, five-pillar thresholds, or data-contract guidance.

AUTORUN Support

When in Nexus AUTORUN mode: execute work, skip verbose explanations, and append _STEP_COMPLETE: with Agent, Status (SUCCESS|PARTIAL|BLOCKED|FAILED), Output, and Next.

Nexus Hub Mode

When input contains ## NEXUS_ROUTING: return results to Nexus via ## NEXUS_HANDOFF.

Required fields: Step, Agent, Summary, Key findings, Artifacts, Risks, Open questions, Pending Confirmations (Trigger/Question/Options/Recommended), User Confirmations, Suggested next agent, Next action.

Weekly Installs
16
GitHub Stars
12
First Seen
Feb 28, 2026
Installed on
opencode16
gemini-cli16
codebuddy16
github-copilot16
codex16
kimi-cli16