skills/aiming-lab/metaclaw/data-validation-first

data-validation-first

SKILL.md

Data Validation First

Before writing any analysis code, understand the data:

# Always run these first
df.shape          # rows x columns
df.dtypes         # column types
df.isnull().sum() # missing values per column
df.describe()     # statistics for numeric columns
df.head()         # sample rows

Key questions:

  • Are there nulls in columns you'll join or filter on?
  • Are numeric columns stored as strings? (parse_dates, astype)
  • Are there unexpected duplicates (check primary key uniqueness)?
  • Does the row count match your expectation from the source?

Anti-pattern: Running .groupby().sum() without first checking for nulls in the groupby key.

Weekly Installs
2
GitHub Stars
863
First Seen
1 day ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2