programmatic-eda
When to use
- You receive a new dataset and need to understand its shape and quality before analysis
- An analysis produces surprising numbers and you want to verify the underlying data first
- A stakeholder asks "is this data reliable?" or "what's in this table?"
- You're about to run a model or statistical test and need data-quality assurance
Process
- Load and overview — run
scripts/data_overview.pyto get row count, dtypes, memory usage, and a sample. Confirm grain (what one row represents). - Null profile — run
scripts/null_profiler.py; compare output against thresholds inreferences/quality_thresholds.mdand flag columns above limits. - Outlier detection — run
scripts/outlier_detector.py(IQR + z-score) on numeric columns; document flagged values and decide: real signal or data error? - Distribution summary — run
scripts/distribution_summary.pyfor descriptive stats and univariate histograms on each numeric column. - Correlation exploration — run
scripts/correlation_explorer.py; flag pairs with |r| > 0.8 as potential multicollinearity or redundancy. - EDA checklist sign-off — work through
references/eda_checklist.mdand confirm each item before declaring the dataset profiled. - Write findings — fill
assets/eda_report_template.mdwith full profiling output; distil top issues intoassets/findings_summary.md.
For pattern recipes (e.g. polars vs pandas equivalents, chunked reads for large files), see references/pandas_polars_recipes.md.
Inputs the skill needs
- Required: dataset path (CSV / Parquet / Excel) or a DataFrame already in scope
- Required: business context — what does one row represent?
- Optional: quality threshold overrides (defaults in
references/quality_thresholds.md) - Optional: columns to skip (PII, binary blobs, high-cardinality IDs)
Output
assets/eda_report_template.md(filled) — full profiling report with per-column statsassets/findings_summary.md(filled) — top 3–5 quality issues and recommended next steps- Console output / plots from scripts for interactive inspection
More from nimrodfisher/data-analytics-skills
funnel-analysis
Conversion funnel analysis with drop-off investigation. Use when analyzing multi-step processes, identifying conversion bottlenecks, comparing segments through a funnel, or optimizing user journeys.
37metric-reconciliation
Cross-source metric validation and discrepancy investigation. Use when metrics from different sources don't match, investigating data quality issues between systems, or validating data migration accuracy.
31insight-synthesis
Transform data findings into compelling insights. Use when converting analysis results into actionable insights, connecting findings to business impact, or preparing insights for stakeholder communication.
31dashboard-specification
Design specifications for effective dashboards. Use when planning new dashboards, improving existing ones, or documenting dashboard requirements before development starts.
30data-quality-audit
Comprehensive data quality assessment against business rules, schema constraints, and freshness expectations. Activate when validating data pipeline outputs before production use, auditing a dataset against defined business rules, or producing a quality scorecard for a data asset.
30root-cause-investigation
Systematic investigation of metric changes and anomalies. Use when a metric unexpectedly changes, investigating business metric drops, explaining performance variations, or drilling into aggregated metric drivers.
30