data-engineering-storage-remote-access-integrations-duckdb

Using DuckDB with remote cloud storage via HTTPFS extension, fsspec, and Delta Lake integration. Covers S3, GCS, Azure, and S3-compatible endpoints.

4

data-engineering-storage-remote-access-libraries-pyarrow-fs

Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.

4

flowerpower

Create and manage data pipelines using the FlowerPower framework with Hamilton DAGs and uv. Lightweight orchestration for batch ETL, data transformation, and ML pipelines. Integrates with Delta Lake, DuckDB, Polars, and cloud storage.

4

data-engineering-observability

Observability and monitoring for data pipelines using OpenTelemetry (traces) and Prometheus (metrics). Covers instrumentation, dashboards, and alerting.

4

data-engineering-storage-formats

Modern data serialization formats: Parquet, Apache Arrow (Feather/IPC), Lance (ML-native), Zarr (chunked arrays), Avro, and ORC. Covers compression, partitioning, and format selection.

4

data-science-visualization

Data visualization for Python: Matplotlib, Seaborn, Plotly, Altair, hvPlot/HoloViz, and Bokeh. Use when creating exploratory charts, interactive dashboards, publication-quality figures, or choosing the right library for your data and audience.

2

data-engineering-storage-remote-access-libraries-fsspec

More from legout/data-agent-skills