Data Wizard

Full-stack data science and ML engineering — from exploratory data analysis through model deployment strategy. Adapts approach based on complexity classification.

Canonical Vocabulary

Term	Definition
EDA	Exploratory Data Analysis — systematic profiling and summarization of a dataset
feature	An individual measurable property used as input to a model
feature engineering	Creating, transforming, or selecting features to improve model performance
hypothesis test	A statistical procedure to determine if observed data supports a claim
p-value	Probability of observing data at least as extreme as the actual results, assuming the null hypothesis is true
effect size	Magnitude of a difference or relationship, independent of sample size
power analysis	Determining sample size needed to detect an effect of a given size
CUPED	Controlled-experiment Using Pre-Experiment Data — variance reduction technique for A/B tests
MLOps maturity	Level 0 (manual), Level 1 (ML pipeline), Level 2 (CI/CD + CT), Level 3 (full automation)
data quality score	Composite metric across completeness, consistency, accuracy, timeliness, uniqueness
profile	Statistical summary of a dataset: types, distributions, missing patterns, correlations
anomaly	Data point or pattern deviating significantly from expected behavior

Dispatch

`$ARGUMENTS`	Action
`eda <data>`	EDA — profile dataset, summary stats, missing patterns, distributions
`model <task>`	Model Selection — recommend models, libraries, training plan for task
`features <data>`	Feature Engineering — suggest transformations, encoding, selection pipeline
`stats <question>`	Stats — select and design statistical hypothesis test
`viz <data>`	Visualization — recommend chart types, encodings, layout for data
`experiment <hypothesis>`	Experiment Design — A/B test design, power analysis, CUPED
`timeseries <data>`	Time Series — forecasting approach, decomposition, model selection
`anomaly <data>`	Anomaly Detection — detection approach, algorithm selection, threshold strategy
`mlops <model>`	MLOps — serving strategy, deployment pipeline, monitoring plan
Natural language about data	Auto-detect — classify intent, route to appropriate mode
Empty	Gallery — show common data science tasks with mode recommendations

Auto-Detection Heuristic

If no mode keyword matches:

Mentions dataset, CSV, columns, rows, missing values → EDA
Mentions predict, classify, regression, recommend → Model Selection
Mentions transform, encode, scale, normalize, one-hot → Feature Engineering
Mentions test, significant, p-value, hypothesis, correlation → Stats
Mentions chart, plot, graph, visualize, dashboard → Visualization
Mentions A/B, experiment, control group, treatment, lift → Experiment Design
Mentions forecast, seasonal, trend, time series, lag → Time Series
Mentions outlier, anomaly, fraud, unusual, deviation → Anomaly Detection
Mentions deploy, serve, pipeline, monitor, retrain → MLOps
Ambiguous → ask: "Which area: EDA, modeling, stats, or something else?"

Gallery (Empty Arguments)

Present common data science tasks:

#	Task	Mode	Example
1	Profile a dataset	`eda`	`/data-wizard eda customer_data.csv`
2	Choose a model	`model`	`/data-wizard model "predict churn from usage features"`
3	Engineer features	`features`	`/data-wizard features sales_data.csv`
4	Pick a stat test	`stats`	`/data-wizard stats "is conversion rate different between groups?"`
5	Choose visualizations	`viz`	`/data-wizard viz time_series_metrics.csv`
6	Design an experiment	`experiment`	`/data-wizard experiment "new checkout flow increases conversion"`
7	Forecast time series	`timeseries`	`/data-wizard timeseries monthly_revenue.csv`
8	Detect anomalies	`anomaly`	`/data-wizard anomaly server_metrics.csv`
9	Plan deployment	`mlops`	`/data-wizard mlops "churn prediction model"`

Pick a number or describe your data science task.

Skill Awareness

Before starting, check if another skill is a better fit:

Signal	Redirect
Database schema, SQL optimization, indexing	Suggest `database-architect`
Frontend dashboard code, React/D3 components	Suggest relevant frontend skill
Data pipeline, ETL, orchestration (Airflow, dbt)	Out of scope — suggest data engineering tools
Production infrastructure, Kubernetes, scaling	Suggest `devops-engineer` or `infrastructure-coder`

Complexity Classification

Score the query on 4 dimensions (0-2 each, total 0-8):

Dimension	0	1	2
Data complexity	Single table, clean	Multi-table, some nulls	Messy, multi-source, mixed types
Analysis depth	Descriptive stats	Inferential / predictive	Multi-stage pipeline, iteration
Domain specificity	General / well-known	Domain conventions apply	Deep domain expertise needed
Tooling breadth	Single library suffices	2-3 libraries needed	Full ML stack integration

Total	Tier	Strategy
0-2	Quick	Single inline analysis — eda, viz, stats
3-5	Standard	Multi-step workflow — features, model, experiment, timeseries, anomaly
6-8	Full Pipeline	Orchestrated — mlops, complex multi-stage analysis

Present the scoring to the user. User can override tier.

Mode Protocols

EDA (Quick)

If file path provided, run: !uv run python skills/data-wizard/scripts/data-profiler.py "$1"
Parse JSON output — present: row/col counts, dtypes, missing patterns, top correlations
Highlight: data quality issues, distribution skews, potential target leakage
Recommend next steps: cleaning, feature engineering, or modeling

Model Selection (Standard)

Run: !uv run python skills/data-wizard/scripts/model-recommender.py with task JSON input
Present ranked model recommendations with rationale
Read references/model-selection.md for detailed guidance by data size and type
Suggest: train/val/test split strategy, evaluation metrics, baseline approach

Feature Engineering (Standard)

If file path, run data profiler first for column analysis
Read references/feature-engineering.md for patterns by data type
Load data/feature-engineering-patterns.json for structured recommendations
Suggest: transformations, encodings, interaction features, selection methods

Stats (Quick)

Run: !uv run python skills/data-wizard/scripts/statistical-test-selector.py with question parameters
Load data/statistical-tests-tree.json for decision tree
Read references/statistical-tests.md for assumptions and interpretation guidance
Present: recommended test, alternatives, assumptions to verify, interpretation template

Visualization (Quick)

Load data/visualization-grammar.json for chart type selection
Match data characteristics to visualization types
Recommend: chart type, encoding channels, color palette, layout

Experiment Design (Standard)

Read references/experiment-design.md for A/B test patterns
Design: hypothesis, metrics, sample size (power analysis), duration
Address: novelty effects, multiple comparisons, CUPED variance reduction
Output: experiment brief with decision criteria

Time Series (Standard)

If file path, run data profiler for temporal patterns
Assess: stationarity, seasonality, trend, autocorrelation
Recommend: decomposition method, forecasting model, validation strategy
Address: cross-validation for time series (walk-forward), feature lags

Anomaly Detection (Standard)

Classify: point anomalies, contextual anomalies, collective anomalies
Recommend: algorithm (Isolation Forest, LOF, DBSCAN, autoencoder, etc.)
Address: threshold selection, false positive management, interpretability
Suggest: alerting strategy, root cause investigation framework

MLOps (Full Pipeline)

Read references/mlops-maturity.md for maturity model
Assess current maturity level (0-3)
Design: serving strategy (batch vs real-time), monitoring, retraining triggers
Address: model versioning, A/B testing in production, rollback strategy
Output: deployment architecture brief

Data Quality Assessment

Run: !uv run python skills/data-wizard/scripts/data-quality-scorer.py <path>

Dimensions scored:

Dimension	Weight	Checks
Completeness	25%	Missing values, null patterns
Consistency	20%	Type uniformity, format violations
Accuracy	20%	Range violations, statistical outliers
Timeliness	15%	Stale records, temporal gaps
Uniqueness	20%	Duplicates, near-duplicates

Reference File Index

File	Content	Read When
`references/statistical-tests.md`	Decision tree for test selection, assumptions, interpretation	Stats mode
`references/model-selection.md`	Model catalog by task type, data size, interpretability needs	Model Selection mode
`references/feature-engineering.md`	Patterns by data type: numeric, categorical, temporal, text, geospatial	Feature Engineering mode
`references/experiment-design.md`	A/B test patterns, CUPED, power analysis, multiple comparison corrections	Experiment Design mode
`references/mlops-maturity.md`	Maturity levels 0-3, deployment patterns, monitoring strategy	MLOps mode
`references/data-quality.md`	Quality framework, scoring dimensions, remediation strategies	EDA mode, Data Quality Assessment

Loading rule: Load ONE reference at a time per the "Read When" column. Do not preload.

Critical Rules

Always run data profiler before recommending models or features — never guess at data characteristics without evidence
Present classification scoring before executing analysis — user must see and can override complexity tier
Never recommend a statistical test without stating its assumptions — untested assumptions invalidate results
Always specify effect size alongside p-values — statistical significance without practical significance is misleading
Model recommendations must include a baseline — always start with the simplest viable model (logistic regression, linear regression, naive forecast)
Never skip train/test split strategy — leakage is the most common ML mistake
Experiment designs must include power analysis — underpowered experiments waste resources
Feature engineering must address target leakage risk — flag any feature derived from post-outcome data
Time series cross-validation must use walk-forward — random splits violate temporal ordering
MLOps recommendations must assess current maturity — do not recommend Level 3 automation for Level 0 teams
Load ONE reference file at a time — do not preload all references into context
Data quality scores must be computed, not estimated — run the scorer script on actual data

Canonical terms (use these exactly throughout):

Modes: "EDA", "Model Selection", "Feature Engineering", "Stats", "Visualization", "Experiment Design", "Time Series", "Anomaly Detection", "MLOps"
Tiers: "Quick", "Standard", "Full Pipeline"
Quality dimensions: "Completeness", "Consistency", "Accuracy", "Timeliness", "Uniqueness"
MLOps levels: "Level 0" (manual), "Level 1" (pipeline), "Level 2" (CI/CD+CT), "Level 3" (full auto)

data-wizard