ab-testing-framework

Installation

SKILL.md

A/B Testing Framework

Design, run, and analyze conversion experiments with statistical rigor.

Install

git clone https://github.com/thatrebeccarae/claude-marketing.git && cp -r claude-marketing/skills/ab-testing-framework ~/.claude/skills/

Test Design Process

Step 1: Hypothesis

Template: If we [change X], then [metric Y] will [increase/decrease] by [Z%] because [reason].

Good hypothesis: "If we change the CTA from Get Started to Start Free Trial, then signup rate will increase by 15% because it reduces uncertainty about cost."

Bad hypothesis: "If we change the button color, conversions will improve." (No reasoning, no expected magnitude.)

Step 2: Sample Size Calculation

To determine how long to run a test:

Required sample per variation = 16 * (p * (1-p)) / (MDE^2)

Where:
  p = baseline conversion rate (as decimal)
  MDE = minimum detectable effect (as decimal)

Baseline Rate	10% MDE	20% MDE	30% MDE
1%	253,414	63,354	28,157
3%	82,369	20,592	9,152
5%	48,640	12,160	5,404
10%	23,040	5,760	2,560
20%	10,240	2,560	1,138

Minimum test duration: 2 full business weeks (to capture day-of-week effects), even if sample size is reached sooner.

Step 3: Test Execution Rules

Random assignment — visitors must be randomly assigned to control/variant
No peeking — do not check results before reaching sample size
No mid-test changes — do not modify variants during the test
Even traffic split — 50/50 for A/B, even splits for multivariate
Single variable — change only one thing per test (unless multivariate)
Full duration — run for the pre-calculated duration, not until significance

Step 4: Statistical Analysis

Frequentist Approach

Z-test for proportions:

Z = (p1 - p2) / sqrt(p_pooled * (1 - p_pooled) * (1/n1 + 1/n2))

Where:
  p1, p2 = conversion rates of control and variant
  p_pooled = (x1 + x2) / (n1 + n2)
  n1, n2 = sample sizes

p-value interpretation:

p < 0.05: Statistically significant (95% confidence)
p < 0.01: Highly significant (99% confidence)
p >= 0.05: Not significant — do not declare a winner

Bayesian Approach

When to use Bayesian:

Low traffic (small sample sizes)
Need to make decisions faster
Want probability of each variant being best (not just "significant or not")

Interpretation: "There is a 94% probability that Variant B is better than Control" vs frequentist "We reject the null hypothesis at 95% confidence."

Step 5: Decision Framework

Result	Significance	Action
Variant wins	p < 0.05	Implement variant
Control wins	p < 0.05	Keep control, learn from failure
No difference	p >= 0.05	Keep control, test something bigger
Variant wins	p = 0.05-0.10	Consider traffic — may need more time

Common Testing Pitfalls

Peeking — checking results early inflates false positive rate from 5% to 26%+
Stopping early — reaching significance != reaching required sample size
Testing too many variants — each variant needs full sample size
Ignoring segments — overall winner may be loser for key segments
Too small an effect — testing for 2% lift needs enormous sample sizes
Not accounting for seasonality — run full weeks, avoid holidays
Multiple metrics — primary metric must be pre-declared; secondary are directional
Survivorship bias — only measuring users who complete, not those who abandon
Simpson paradox — segment-level winners can reverse at aggregate level
Novelty effect — new designs get temporary lift; re-test after 2-4 weeks

What to Test (Prioritized by Impact)

High Impact

Value proposition / headline
CTA text and placement
Pricing and offer structure
Form length (fields removed)
Page layout (single column vs multi)
Social proof presence and placement

Medium Impact

Image/video vs static
Testimonial format (text vs video)
Navigation presence on landing pages
Trust badges and security signals
Urgency elements (countdown, stock)

Low Impact (Usually Not Worth Testing)

Button color (unless extreme contrast issue)
Font changes
Minor copy tweaks
Icon styles
Footer content

Integration with Other Skills

cro-auditor — CRO audit generates test hypotheses; this skill designs the experiments
google-analytics — GA4 for experiment data and segment analysis
copywriting-frameworks — Generate variant copy using proven frameworks

Related skills

More from thatrebeccarae/claude-marketing

Installs

Repository

thatrebeccarae/…arketing

GitHub Stars

First Seen

Apr 8, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass