data-validation-first

Installation
SKILL.md

Data Validation First

Before writing any analysis code, understand the data:

# Always run these first
df.shape          # rows x columns
df.dtypes         # column types
df.isnull().sum() # missing values per column
df.describe()     # statistics for numeric columns
df.head()         # sample rows

Key questions:

  • Are there nulls in columns you'll join or filter on?
  • Are numeric columns stored as strings? (parse_dates, astype)
  • Are there unexpected duplicates (check primary key uniqueness)?
  • Does the row count match your expectation from the source?

Anti-pattern: Running .groupby().sum() without first checking for nulls in the groupby key.

Related skills
Installs
9
GitHub Stars
3.3K
First Seen
Mar 13, 2026