tracing-upstream-lineage
Upstream Lineage: Sources
Trace the origins of data and answer "Where does this data come from?"
Lineage Investigation
Step 1: Identify the Target Type
Determine what we are tracing:
- Table
- Column
- DAG
Step 2: Find the Producing DAG
- List DAGs: use
list_active_dagsandlist_paused_dags - Read DAG source: use
get_dag_source_code - If a run exists, use
analyse_dag_latest_runto see tasks and logs
Step 3: Trace Data Sources
From the DAG code, identify source tables and systems:
- SQL sources in FROM or JOIN clauses
- External sources via operator hooks or connection IDs
- Files in object storage
Use go_to_connections_view to inspect connection metadata.
Step 4: Build the Lineage Chain
Example:
TARGET: analytics.orders_daily
^
+-- DAG: etl_daily_orders
^
+-- SOURCE: raw.orders
|
+-- SOURCE: dim.customers
Step 5: Check Source Health
- Use
get_dag_runsorget_dag_historyon upstream DAGs - For logs, use
go_to_dag_log_view
Lineage for Columns
- Find the column in the target table schema
- Search DAG source for references
- Trace transformations and mappings
Output: Lineage Report
Include:
- Summary of sources
- Lineage diagram
- Source details (connections, freshness)
- Transformation chain
- Data quality implications
Related Skills
- checking-freshness
- debugging-dags
- tracing-downstream-lineage
- annotating-task-lineage
- creating-openlineage-extractors
More from necatiarslan/airflow-vscode-extension
migrating-airflow-2-to-3
Guide for migrating Apache Airflow 2.x projects to Airflow 3.x. Use when the user mentions Airflow 3 migration, upgrade, compatibility issues, breaking changes, or wants to modernize their Airflow codebase.
29airflow-hitl
Use when the user needs human-in-the-loop workflows in Airflow (approval/reject, form input, or human-driven branching). Covers ApprovalOperator, HITLOperator, HITLBranchOperator, HITLEntryOperator. Requires Airflow 3.1+.
28annotating-task-lineage
Annotate Airflow tasks with data lineage using inlets and outlets. Use when the user wants to add lineage metadata to tasks, specify input/output datasets, or enable lineage tracking for operators without built-in OpenLineage extraction.
28airflow
Manages Apache Airflow operations including listing, running, and debugging DAGs, viewing logs, and checking server status using the VS Code extension tools.
27testing-dags
Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues".
27authoring-dags
Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions. For testing and debugging DAGs, see the testing-dags skill.
27