ab-test-setup

Installation
SKILL.md

A/B Test Setup Skill

Overview

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

Quick Start

# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

# Design a complete A/B test plan
python scripts/test_designer.py test_config.json

# Analyze A/B test results
python scripts/results_analyzer.py results.json

Tools Overview

Tool Purpose Input Output
sample_size_calculator.py Sample size calculation Baseline rate, MDE, power Required samples + duration
test_designer.py Test plan design JSON test config Complete test plan document
results_analyzer.py Results analysis JSON with test results Statistical analysis + recommendation

Workflows

Workflow 1: New A/B Test Setup

  1. Define hypothesis and success metric
  2. Run sample_size_calculator.py with baseline conversion and minimum detectable effect
  3. Create test configuration JSON (see Common Patterns)
  4. Run test_designer.py to generate complete test plan
  5. Share plan with stakeholders for alignment before launch

Workflow 2: Test Results Analysis

  1. Collect test results into JSON format
  2. Run results_analyzer.py to get statistical significance
  3. Review confidence interval, p-value, and effect size
  4. Check for segment-level effects if overall result is inconclusive
  5. Make ship/no-ship decision based on analysis

Workflow 3: Experimentation Program Review

  1. Compile results from multiple past tests
  2. Run results_analyzer.py --batch on all results
  3. Review win rate, average effect size, and velocity
  4. Identify patterns in winning vs losing tests
  5. Optimize test pipeline based on learnings

Reference Documentation

See references/ab-testing-guide.md for comprehensive methodology covering:

  • Statistical foundations (z-tests, confidence intervals)
  • Sample size theory and trade-offs
  • Common experimentation pitfalls
  • Multi-variant and sequential testing
  • Bayesian vs frequentist approaches

Common Patterns

Pattern: Test Configuration JSON

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON

{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

Quick Reference: Common Effect Sizes

Context Small Effect Medium Effect Large Effect
Conversion Rate 2-5% relative 5-15% relative > 15% relative
Revenue per User 1-3% 3-8% > 8%
Engagement Rate 3-5% 5-10% > 10%
Related skills
Installs
58
GitHub Stars
111
First Seen
Mar 10, 2026