data-engineering-core
No SKILL.md available for this skill.
View on GitHubMore from legout/data-agent-skills
data-engineering-storage-remote-access-libraries-fsspec
Comprehensive guide to fsspec: the universal filesystem interface for Python. Covers S3, GCS, Azure via s3fs, gcsfs, adlfs; protocol chaining, caching, async operations, and integration with the data ecosystem.
4data-engineering-storage-remote-access-integrations-duckdb
Using DuckDB with remote cloud storage via HTTPFS extension, fsspec, and Delta Lake integration. Covers S3, GCS, Azure, and S3-compatible endpoints.
4data-engineering-storage-formats
Modern data serialization formats: Parquet, Apache Arrow (Feather/IPC), Lance (ML-native), Zarr (chunked arrays), Avro, and ORC. Covers compression, partitioning, and format selection.
4data-engineering-storage-remote-access
Cloud storage access in Python: fsspec, pyarrow.fs, obstore libraries, plus integrations with Polars, DuckDB, PyArrow, Delta Lake, and Iceberg.
2data-engineering-catalogs
Data catalogs: Iceberg catalogs (Hive Metastore, AWS Glue, Tabular), using DuckDB as a lightweight multi-source catalog, comparisons of Amundsen/DataHub/OpenMetadata, and patterns for unified data access.
2building-data-apps
Build interactive web applications for data science and ML: Streamlit, Panel, Gradio, Dash, and NiceGUI. Use for creating stakeholder-facing dashboards, ML model demos, and internal data tools that non-technical users can interact with.
2