data-quality-audit
When to use
- A data pipeline has just loaded new data and needs validation before downstream reports consume it
- A stakeholder has flagged data quality concerns (wrong totals, unexpected nulls, stale data)
- You need to produce a formal data quality scorecard for a data asset as part of a data governance process
- You are onboarding a new data source and need to understand its quality profile before building on it
Process
- Null and completeness audit — run
scripts/null_counter.pyfor a column-by-column null profile. Flag columns above acceptable thresholds for the business context. - Duplicate detection — run
scripts/duplicate_finder.pyto identify full-row and key-level duplicates. Determine if duplicates are intentional (versioning) or errors (pipeline fan-out). - Referential integrity check — run
scripts/referential_integrity.pyto validate that foreign key values in child tables exist in parent tables. Report orphan rate per relationship. - Value range validation — run
scripts/value_range_validator.pywith business rules defined inreferences/business_rule_patterns.md. Flag values outside acceptable ranges. - Freshness check — run
scripts/freshness_check.pyto verify the dataset is up to date — compare the latest record timestamp against the expected lag for this pipeline. - Score and classify findings — map each finding to a quality dimension using
references/quality_dimensions.md. Assign severity (CRITICAL / HIGH / MEDIUM / LOW). - Produce deliverables — fill
assets/audit_report_template.htmlfor a shareable report; fillassets/quality_rubric.mdfor a concise scorecard.
Inputs the skill needs
- Required: dataset (CSV / Parquet / database table reference)
- Required: schema relationships — which columns are primary keys, which are foreign keys to which tables
- Required: business rules — acceptable value ranges, expected value sets, freshness SLA
- Optional: acceptable error rates — at what threshold does a failure become CRITICAL vs. HIGH
- Optional: pipeline schedule — to assess freshness relative to expected update frequency
Output
assets/audit_report_template.html(filled) — full quality report, shareable with stakeholdersassets/quality_rubric.md(filled) — one-page quality scorecard with dimension scores- Script console output — per-check pass/fail counts for each validation script
More from nimrodfisher/data-analytics-skills
dashboard-specification
Design specifications for effective dashboards. Use when planning new dashboards, improving existing ones, or documenting dashboard requirements before development starts.
30root-cause-investigation
Systematic investigation of metric changes and anomalies. Use when a metric unexpectedly changes, investigating business metric drops, explaining performance variations, or drilling into aggregated metric drivers.
30executive-summary-generator
Create concise executive summaries from detailed analysis. Use when preparing board decks, executive briefings, or condensing complex analysis into decision-ready formats for senior audiences.
30query-validation
SQL query review for correctness, performance, and best practices. Activate when a query needs review before production use, shows unexpected results, or runs too slowly.
29segmentation-analysis
Customer/user segmentation with actionable insights. Use when identifying distinct customer groups, analyzing segment-specific behavior, profiling high-value segments, or testing segmentation hypotheses.
29data-narrative-builder
Build compelling data-driven narratives. Use when presenting analysis results, creating stakeholder reports, or transforming a set of findings into a story that drives a specific decision or action.
29