event-streaming

Installation

SKILL.md

Streaming Data Skill

Expert guidance for real-time and near-real-time data pipelines: continuous stream processing, event-driven architectures, and batch-vs-streaming decisions.

When to Use

Activate when:

Building or troubleshooting Kafka pipelines (producers, consumers, Connect)
Implementing stream processing with Flink, Spark Streaming, or Kafka Streams
Designing event-driven architectures or real-time analytics
Configuring warehouse streaming ingestion (Snowpipe, BigQuery Storage Write API)
Creating materialized views or dynamic tables
Evaluating latency requirements (batch vs streaming)
Handling schema evolution, exactly-once semantics, or idempotent processing
Debugging consumer lag, backpressure, or checkpoint failures

Do NOT use for: batch ETL (use dbt-transforms), static data modeling, SQL optimization on analytical queries, data quality on static datasets, one-time migrations.

Scope Constraints

This skill covers architecture decisions, configuration patterns, and code generation for streaming systems.
It does NOT manage infrastructure provisioning (Terraform, CloudFormation) or CI/CD pipelines.
For Kafka security details, production tuning, and connector configs, load references/kafka-deep-dive.md.
For Flink/Spark framework-specific APIs, load references/flink-spark-streaming.md.
For warehouse-native streaming (Snowpipe, BigQuery, Dynamic Tables), load references/warehouse-streaming-ingestion.md.
For testing and replay strategies, load references/stream-testing-patterns.md.
Load only the reference file relevant to the current task.

Model Routing

reasoning_demand	preferred	acceptable	minimum
medium	Sonnet	Sonnet, Opus	Sonnet

Core Principles

Event Time over Processing Time. Design for event time. Use watermarks with WATERMARK FOR event_timestamp AS event_timestamp - INTERVAL 'N' SECOND to handle out-of-order and late-arriving data.

Exactly-Once vs At-Least-Once. At-least-once with idempotent consumers is simpler and often sufficient. Use exactly-once for financial transactions; at-least-once for analytics dashboards.

Backpressure Awareness. Monitor consumer lag and implement rate limiting. Use Kafka buffering or Flink backpressure mechanisms to smooth traffic spikes. Alert when lag exceeds thresholds.

Schema Evolution. Use schema registries (Confluent, AWS Glue) from day one. Enforce BACKWARD compatibility for most production systems. Add new fields with defaults; never remove required fields without multi-phase migration.

Idempotent Consumers. Store offsets transactionally with output, use unique keys for upserts, avoid operations that accumulate state incorrectly on replay.

Architecture Decision Matrix

Architecture	Latency	Complexity	Best For
Traditional Batch	Hours-days	Low	Historical reporting, large aggregations
Micro-Batch (Spark Streaming)	Seconds-minutes	Medium	Near-real-time analytics, unified batch/stream
True Streaming (Flink, Kafka Streams)	Milliseconds-seconds	High	Real-time dashboards, fraud detection, alerting
Kappa Architecture	Milliseconds-seconds	Medium	Stream-first, immutable event log, reprocessing
Warehouse-Native Streaming	Seconds-minutes	Low	SQL-first teams, simple ingestion, BI integration

Stream Processing Framework Selection

Framework	Latency	SQL	Managed	Best For
Kafka Streams	ms	KSQL (separate)	No	Microservices, JVM apps
Apache Flink	ms	Flink SQL	AWS KDA, Confluent	Complex event processing, large state
Spark Structured Streaming	seconds	Spark SQL	Databricks, EMR	Unified batch/stream, ML integration
ksqlDB	ms	Streaming SQL	Confluent Cloud	SQL-first simple transforms
Apache Beam/Dataflow	seconds	Limited	GCP Dataflow	Multi-cloud, GCP native

Windowing Patterns

Pattern	Description	Use Case
Tumbling	Fixed, non-overlapping intervals	Hourly aggregations, regular reporting
Sliding	Fixed, overlapping intervals	Moving averages, trend detection
Session	Gap-based, variable size	User sessions, activity bursts
Global	Custom trigger-controlled	Accumulate until condition met

Configure watermarks to balance latency vs completeness. Shorter watermark delay = lower latency but may drop late data. Longer delay = higher completeness but delayed results. Handle late arrivals with side outputs (Flink) or allowed lateness windows.

State Management & Checkpointing

State backends: RocksDB (disk, large state GB-TB), in-memory (fast, limited by heap), managed (cloud platforms).

Checkpointing: Periodic snapshots of state and positions. Shorter intervals (30s-1min) = faster recovery + more overhead. Longer intervals (5-10min) = less overhead + slower recovery.

State TTL: Expire old state to prevent unbounded growth. Set TTL based on business logic.

Savepoints: Manual checkpoints for planned downtime. Always take a savepoint before production deployments.

Schema Compatibility Modes

Mode	Allowed Changes	Use When
BACKWARD	Delete fields, add optional	Consumers upgrade first (most common)
FORWARD	Add fields, delete optional	Producers upgrade first
FULL	Backward + Forward only	Upgrade order unpredictable

Monitoring Essentials

Metric	Alert Threshold
Consumer Lag	> 1M messages or > 5 min
Throughput	< 50% baseline
Error Rate	> 0.1% for critical pipelines
Checkpoint Duration	> 2x interval
Backpressure Ratio	> 10% sustained
Partition Skew	Max/min ratio > 3x

Alert tiers: Critical (page) = consumer stopped, error spike, EOS violation. Warning (hours) = elevated lag, slow checkpoints. Info (trends) = rebalancing, schema changes.

Security Posture

Generates Kafka configs, stream processing code, and warehouse streaming pipelines. See Security & Compliance Patterns.

Credentials: Kafka broker auth (SASL/mTLS), Schema Registry auth, warehouse connections. All secrets via environment variables.

Kafka auth: Use SASL/SCRAM or mTLS. Never PLAINTEXT in production. Use ConfigProvider for connector secrets from Vault/AWS SM.

Capability	Tier 1 (Cloud-Native)	Tier 2 (Regulated)	Tier 3 (Air-Gapped)
Kafka producer/consumer	Deploy to dev	Generate for review	Generate only
Flink/Spark jobs	Submit to dev	Generate for review	Generate only
Warehouse streaming	Configure dev	Generate configs	Generate only

Reference Files

Kafka Deep Dive — Architecture, EOS, Connect, ksqlDB, security, production tuning
Flink & Spark Streaming — DataStream API, Flink SQL, watermarks, state backends, deployment
Warehouse Streaming Ingestion — Snowpipe Streaming, Dynamic Tables, BigQuery Storage Write API
Stream Testing Patterns — Embedded Kafka, testcontainers, stream replay, backfill patterns

Related skills

More from dtsong/data-engineering-skills

Installs

Repository

dtsong/data-eng…g-skills

GitHub Stars

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass