create-experiment-design
Create Experiment Design
Overview
Design A/B tests and experiments with scientific rigor. Includes a falsifiable hypothesis, pre-registered analysis plan, sample size calculation, guardrail metrics, and clear decision criteria to prevent p-hacking and HARKing.
Workflow
-
Read product context — Scan
.chalk/docs/product/for the product profile, relevant PRDs, and any existing experiment docs. Check for a metrics framework that defines standard metrics and their baseline values. -
Define the hypothesis — Parse
$ARGUMENTSand work with the user to formulate a hypothesis in the format: "If we [change], then [primary metric] will [direction] by [minimum detectable effect], because [rationale]." The hypothesis must be falsifiable. -
Select metrics — Define:
- Primary metric: The single metric that determines success or failure. Must be measurable within the experiment duration.
- Secondary metrics: Additional metrics to monitor for deeper understanding. These do not determine the outcome.
- Guardrail metrics: Metrics that must NOT degrade (e.g., error rate, page load time, support ticket volume). If a guardrail is breached, the experiment is stopped regardless of the primary metric.
-
Calculate sample size — Based on: baseline conversion rate, minimum detectable effect (MDE), statistical significance level (default: 95%), statistical power (default: 80%). State the required sample size per variant.
More from generaljerel/chalk-skills
python-clean-architecture
Clean architecture patterns for Python services — service layer, repository pattern, domain models, dependency injection, error hierarchy, and testing strategy
24create-handoff
Generate a handoff document after implementation work is complete — summarizes changes, risks, and review focus areas for the review pipeline. Use when done coding and ready to hand off for review.
16create-review
Bootstrap a local AI review pipeline and generate a paste-ready review prompt for any reviewer agent. Use after creating a handoff or when ready to get an AI code review.
15fix-findings
Fix findings from the active review session — reads reviewer findings files, applies fixes by priority, and updates the resolution log. Use after pasting reviewer output into findings files.
15fix-review
When the user asks to fix, address, or work on PR review comments — fetch review comments from a GitHub pull request and apply fixes to the local codebase. Requires gh CLI.
15review-changes
End-to-end review pipeline — creates a handoff, generates a review (self-review or paste-ready for another provider), then offers to fix findings. Use when you want to review your changes before pushing.
13