data-engineering

Comprehensive data engineering skill suite covering core libraries (Polars, DuckDB, PyArrow), lakehouse formats, cloud storage, orchestration, streaming, quality, observability, and AI/ML pipelines.

5

data-engineering-storage-remote-access-libraries-obstore

High-performance Rust-based remote filesystem library. Covers store creation, basic operations, async API, streaming uploads, Arrow integration, and fsspec compatibility wrapper.

4

data-science-eda

Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.

4

data-engineering-storage-remote-access-libraries-fsspec

Comprehensive guide to fsspec: the universal filesystem interface for Python. Covers S3, GCS, Azure via s3fs, gcsfs, adlfs; protocol chaining, caching, async operations, and integration with the data ecosystem.

4

data-engineering-storage-remote-access-integrations-duckdb

Using DuckDB with remote cloud storage via HTTPFS extension, fsspec, and Delta Lake integration. Covers S3, GCS, Azure, and S3-compatible endpoints.

4

data-engineering-storage-remote-access-libraries-pyarrow-fs

Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.

4

designing-data-storage

More from legout/data-agent-skills