data-lineage-tracer
Data Lineage Tracer Protocol
This skill answers the critical question: "If I change this column, who dies?" It traces the flow of data backward (Upstream) to its source, and forward (Downstream) to its consumers.
Core assumption: Data does not exist in a vacuum. It flows through Views, Materialized Views, Stored Procedures, Triggers, and ORM boundaries.
1. Upstream & Downstream Mapping (Static vs Dynamic)
- Default (Static): Analyze based on provided
.sql, schema files, and application ORM code (e.g. Prisma models). - Dynamic (On-Demand): Only connect to the database to inspect dynamic dependencies (e.g. live
pg_viewsorinformation_schema.triggers) if requested explicitly. If assessingreporting.daily_sales.total_amount: - Is it a physical physical or a computed column?
- Does a
VIEWproject it directly frompublic.orders? - Is it populated via a daily
INSERT INTO ... SELECTcron job?
Downstream Tracing (Who uses it?)
If dropping public.users.phone_number:
- Is there a
VIEWthat will become invalid (breaking the DB)? - Is there a Database Trigger relying on it?
- Is there an ORM/GraphQL endpoint or REST Serializer exposing it?
2. Dependency Risk Assessment
Categorize the objects affected by a hypothetical change:
- Hard Dependencies (DB Level): Views, Materialized Views, Foreign Keys, Stored Procedures. (Will hard-crash the database operations).
- Soft Dependencies (App Level): ORM models, API serializers, BI tool dashboards. (Will crash the app layer).
3. Visual Output Generation
Provide a clear trace map using Markdown or Mermaid.js so the developer can visualize the blast radius.
Required Outputs (Must write BOTH to docs/database-report/):
- Human-Readable Markdown (
docs/database-report/data-lineage-report.md)
### 🗺️ Data Lineage for `orders.customer_id`
**Upstream (Writers):**
- ⬅️ Written by `process_payment()` Stored Procedure.
- ⬅️ Populated via API Endpoint `POST /api/v1/checkout`.
**Downstream (Readers/Dependents):**
- ➡️ `VIEW analytics.monthly_cohorts` (Hard dependency - will BREAK if column is dropped).
- ➡️ `TRIGGER sync_to_crm_on_update` (Hard dependency).
- ➡️ Metabase BI Dashboard (Soft dependency).
### ⚠️ Blast Radius Warning
If you rename or modify the type of `customer_id`:
1. The `monthly_cohorts` view must be re-created via `CREATE OR REPLACE VIEW`.
2. The trigger `sync_to_crm_on_update` must be updated to reference the new name.
- Machine-Readable JSON (
docs/database-report/data-lineage-output.json)
{
"skill": "data-lineage-tracer",
"target": "orders.customer_id",
"upstream": ["process_payment()", "POST /api/v1/checkout"],
"downstream": [
{"type": "hard", "entity": "VIEW analytics.monthly_cohorts"},
{"type": "hard", "entity": "TRIGGER sync_to_crm_on_update"}
],
"blast_radius_risk": "High"
}
(Optional: Output a Mermaid graph if requested for complex ETL chains)
graph TD;
API-->public.orders;
public.orders-->VIEW_monthly_sales;
VIEW_monthly_sales-->BI_Dashboard;
Guardrails
- Incomplete Context: Database artifacts (views, triggers) are reliable to trace inside static SQL files. Remind the user that external soft dependencies (like Metabase or random Python scripts) cannot be guaranteed by DB introspection alone.
- Transitive Dependencies: Ensure you trace at least 2 levels deep (e.g., Table A drives View B, View B drives View C).
More from fatih-developer/fth-skills
tool-selector
Before executing a task, analyze the available tool set (web search, code execution, file read/write, API calls, database queries, memory tools), select the optimal tools, plan execution order, and prevent unnecessary tool calls. Triggers on multi-step tasks, 'which tool should I use', 'plan the tools', or whenever multiple tools could apply.
10rate-limit-strategist
Selects the optimal rate limiting strategy (sliding window, token bucket, leaky bucket) for per-user, per-IP, or global levels.
4ecosystem-api
Comprehensive map and workflows for the API domain. Triggers when users ask to 'design an API', 'secure the APIs', 'update endpoints', 'view the API ecosystem', or want to see all available API orchestration skills.
4mastermind-logger
Compiles and extracts session knowledge into a living, interconnected LLM-Wiki. Instead of writing isolated logs, it identifies key entities, updates cross-referenced topic files in docs/knowledgelib/, and maintains an index and chronological log. Use this to ensure persistent, compounding project knowledge.
3webhook-architect
Architects the provider side of the webhook infrastructure. Not only sends data but designs event schemas and robust delivery mechanisms.
3claw-integration-design
Design and implement secure APIs and integration points for external AI bots and agents (OpenAI tool-calling, LangChain, OpenClaw). Use this skill whenever the user mentions agent integrations, tool manifests, bot API access, OAuth2 scopes for bots, webhook integrations, or designing endpoints for agentic workflows. Also trigger when users ask about securing bot writes or designing APIs for AI-powered automation.
2