ML Data Pipeline Architecture

Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse.

ADR: 2026-01-22-polars-preference-hook (efficiency preferences framework)

Note: A PreToolUse hook enforces Polars preference. To use Pandas, add # polars-exception: <reason> at file top.

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

When to Use This Skill

Use this skill when:

Deciding between Polars and Pandas for a data pipeline
Optimizing memory usage with zero-copy Arrow patterns
Loading data from ClickHouse into PyTorch DataLoaders
Implementing lazy evaluation for large datasets
Migrating existing Pandas code to Polars

ml-data-pipeline-architecture

ML Data Pipeline Architecture

When to Use This Skill