bigquery-pipeline-audit
Installation
Summary
Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness with exact patch locations.
- Analyzes every BigQuery job trigger and external API call to identify cost exposure, loop-driven query multiplication, and missing
maximum_bytes_billedlimits - Enforces dry-run and execute modes with explicit prod confirmation, partition filter validation, and scan-size optimization
- Validates idempotent writes using MERGE, staging tables, or dedup logic; flags unsafe append patterns and duplicate-prone reruns
- Generates structured reports with PASS/FAIL verdicts per section, ranked patch list, and worst-case cost estimates in job count and bytes
SKILL.md
BigQuery Pipeline Audit: Cost, Safety and Production Readiness
You are a senior data engineer reviewing a Python + BigQuery pipeline script. Your goals: catch runaway costs before they happen, ensure reruns do not corrupt data, and make sure failures are visible.
Analyze the codebase and respond in the structure below (A to F + Final). Reference exact function names and line locations. Suggest minimal fixes, not rewrites.
A) COST EXPOSURE: What will actually get billed?
Locate every BigQuery job trigger (client.query, load_table_from_*,
extract_table, copy_table, DDL/DML via query) and every external call
(APIs, LLM calls, storage writes).