tracing-upstream-lineage

Originally fromastronomer/agents
SKILL.md

Upstream Lineage: Sources

Trace the origins of data and answer "Where does this data come from?"

Lineage Investigation

Step 1: Identify the Target Type

Determine what we are tracing:

  • Table
  • Column
  • DAG

Step 2: Find the Producing DAG

  1. List DAGs: use list_active_dags and list_paused_dags
  2. Read DAG source: use get_dag_source_code
  3. If a run exists, use analyse_dag_latest_run to see tasks and logs

Step 3: Trace Data Sources

From the DAG code, identify source tables and systems:

  • SQL sources in FROM or JOIN clauses
  • External sources via operator hooks or connection IDs
  • Files in object storage

Use go_to_connections_view to inspect connection metadata.

Step 4: Build the Lineage Chain

Example:

TARGET: analytics.orders_daily
    ^
    +-- DAG: etl_daily_orders
            ^
            +-- SOURCE: raw.orders
            |
            +-- SOURCE: dim.customers

Step 5: Check Source Health

  • Use get_dag_runs or get_dag_history on upstream DAGs
  • For logs, use go_to_dag_log_view

Lineage for Columns

  1. Find the column in the target table schema
  2. Search DAG source for references
  3. Trace transformations and mappings

Output: Lineage Report

Include:

  • Summary of sources
  • Lineage diagram
  • Source details (connections, freshness)
  • Transformation chain
  • Data quality implications

Related Skills

  • checking-freshness
  • debugging-dags
  • tracing-downstream-lineage
  • annotating-task-lineage
  • creating-openlineage-extractors
Weekly Installs
18
GitHub Stars
42
First Seen
Feb 6, 2026
Installed on
github-copilot17
opencode4
cursor4
gemini-cli3
continue3
codebuddy3