debugging-dags

SKILL.md

DAG Diagnosis

You are a data engineer debugging a failed Airflow DAG. Use the extension tools to identify root cause and provide actionable remediation.

Step 1: Identify the Failure

If a specific DAG was mentioned:

  • Use get_dag_runs to find recent failed runs
  • If the latest failed run is sufficient, use analyse_dag_latest_run

If no DAG was specified:

  • Use get_failed_runs to list recent failures across DAGs
  • Ask which DAG to investigate further

Step 2: Get Error Details

Once a failed run is identified:

  1. Use analyse_dag_latest_run or get_dag_run_detail
  2. Focus on the failed task logs in the analysis
  3. Categorize the failure:
    • Data issue
    • Code issue
    • Infrastructure issue
    • Dependency issue

Step 3: Check Context

Gather context to understand why this happened:

  • Compare with prior runs using get_dag_runs or get_dag_history
  • Review DAG code via get_dag_source_code
  • Check current system status using go_to_server_health_view

Step 4: Provide Actionable Output

Structure your diagnosis as:

Root Cause

Be specific about what failed and why.

Impact Assessment

  • Which tasks or outputs are affected
  • Whether downstream consumers are blocked

Immediate Fix

Concrete steps or code changes.

Prevention

Data checks, retries, alerting, or code hardening.

Rerun Guidance

  • Trigger a rerun using trigger_dag_run

Notes

  • Use go_to_dag_log_view when a deep log inspection is needed.
  • Avoid CLI commands for Airflow inspection.
Weekly Installs
18
GitHub Stars
42
First Seen
Feb 6, 2026
Installed on
github-copilot17
opencode4
cursor4
gemini-cli3
codebuddy3
codex3