etl-designer
SKILL.md
ETL Designer
Design robust ETL/ELT pipelines for data processing.
Quick Start
Use Airflow for orchestration, implement idempotent operations, add error handling, monitor pipeline health.
Instructions
Airflow DAG Structure
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'retries': 3,
'retry_delay': timedelta(minutes=5),
'email_on_failure': True,
'email': ['alerts@company.com']
}
with DAG(
'etl_pipeline',
default_args=default_args,
schedule_interval='0 2 * * *', # Daily at 2 AM
start_date=datetime(2024, 1, 1),
catchup=False
) as dag:
extract = PythonOperator(
task_id='extract_data',
python_callable=extract_from_source
)
transform = PythonOperator(
task_id='transform_data',
python_callable=transform_data
)
load = PythonOperator(
task_id='load_to_warehouse',
python_callable=load_to_warehouse
)
extract >> transform >> load
Incremental Processing
def extract_incremental(last_run_date):
query = f"""
SELECT * FROM source_table
WHERE updated_at > '{last_run_date}'
"""
return pd.read_sql(query, conn)
Error Handling
def safe_transform(data):
try:
transformed = transform_data(data)
return transformed
except Exception as e:
logger.error(f"Transform failed: {e}")
send_alert(f"Pipeline failed: {e}")
raise
Best Practices
- Make operations idempotent
- Use incremental processing
- Implement proper error handling
- Add monitoring and alerts
- Use data quality checks
- Document pipeline logic
Weekly Installs
6
Repository
armanzeroeight/…-pluginsGitHub Stars
26
First Seen
Feb 4, 2026
Security Audits
Installed on
opencode5
gemini-cli5
claude-code5
github-copilot5
codex5
kimi-cli4