Create Experiment Design

Overview

Design A/B tests and experiments with scientific rigor. Includes a falsifiable hypothesis, pre-registered analysis plan, sample size calculation, guardrail metrics, and clear decision criteria to prevent p-hacking and HARKing.

Workflow

Read product context — Scan .chalk/docs/product/ for the product profile, relevant PRDs, and any existing experiment docs. Check for a metrics framework that defines standard metrics and their baseline values.
Define the hypothesis — Parse $ARGUMENTS and work with the user to formulate a hypothesis in the format: "If we [change], then [primary metric] will [direction] by [minimum detectable effect], because [rationale]." The hypothesis must be falsifiable.
Select metrics — Define:
- Primary metric: The single metric that determines success or failure. Must be measurable within the experiment duration.
- Secondary metrics: Additional metrics to monitor for deeper understanding. These do not determine the outcome.
- Guardrail metrics: Metrics that must NOT degrade (e.g., error rate, page load time, support ticket volume). If a guardrail is breached, the experiment is stopped regardless of the primary metric.
Calculate sample size — Based on: baseline conversion rate, minimum detectable effect (MDE), statistical significance level (default: 95%), statistical power (default: 80%). State the required sample size per variant.

create-experiment-design

Create Experiment Design

Overview

Workflow

More from generaljerel/chalk-skills

python-clean-architecture

create-handoff

create-review

fix-findings

fix-review

review-changes