data-quality-audit
When to use
- A data pipeline has just loaded new data and needs validation before downstream reports consume it
- A stakeholder has flagged data quality concerns (wrong totals, unexpected nulls, stale data)
- You need to produce a formal data quality scorecard for a data asset as part of a data governance process
- You are onboarding a new data source and need to understand its quality profile before building on it
Process
- Null and completeness audit — run
scripts/null_counter.pyfor a column-by-column null profile. Flag columns above acceptable thresholds for the business context. - Duplicate detection — run
scripts/duplicate_finder.pyto identify full-row and key-level duplicates. Determine if duplicates are intentional (versioning) or errors (pipeline fan-out). - Referential integrity check — run
scripts/referential_integrity.pyto validate that foreign key values in child tables exist in parent tables. Report orphan rate per relationship. - Value range validation — run
scripts/value_range_validator.pywith business rules defined inreferences/business_rule_patterns.md. Flag values outside acceptable ranges. - Freshness check — run
scripts/freshness_check.pyto verify the dataset is up to date — compare the latest record timestamp against the expected lag for this pipeline. - Score and classify findings — map each finding to a quality dimension using
references/quality_dimensions.md. Assign severity (CRITICAL / HIGH / MEDIUM / LOW). - Produce deliverables — fill
assets/audit_report_template.htmlfor a shareable report; fillassets/quality_rubric.mdfor a concise scorecard.
Inputs the skill needs
- Required: dataset (CSV / Parquet / database table reference)
- Required: schema relationships — which columns are primary keys, which are foreign keys to which tables
- Required: business rules — acceptable value ranges, expected value sets, freshness SLA
- Optional: acceptable error rates — at what threshold does a failure become CRITICAL vs. HIGH
- Optional: pipeline schedule — to assess freshness relative to expected update frequency
Output
assets/audit_report_template.html(filled) — full quality report, shareable with stakeholdersassets/quality_rubric.md(filled) — one-page quality scorecard with dimension scores- Script console output — per-check pass/fail counts for each validation script
More from nimrodfisher/data-analytics-skills
funnel-analysis
Conversion funnel analysis with drop-off investigation. Use when analyzing multi-step processes, identifying conversion bottlenecks, comparing segments through a funnel, or optimizing user journeys.
37metric-reconciliation
Cross-source metric validation and discrepancy investigation. Use when metrics from different sources don't match, investigating data quality issues between systems, or validating data migration accuracy.
31insight-synthesis
Transform data findings into compelling insights. Use when converting analysis results into actionable insights, connecting findings to business impact, or preparing insights for stakeholder communication.
31dashboard-specification
Design specifications for effective dashboards. Use when planning new dashboards, improving existing ones, or documenting dashboard requirements before development starts.
30time-series-analysis
Temporal pattern detection and forecasting. Use when analyzing trends over time, detecting seasonality, identifying anomalies in time series, or building simple forecasting models for planning.
30root-cause-investigation
Systematic investigation of metric changes and anomalies. Use when a metric unexpectedly changes, investigating business metric drops, explaining performance variations, or drilling into aggregated metric drivers.
30