Marketing A/B Testing

Framework

IRON LAW: One Variable at a Time

If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.

What to Test (by Impact)

Element	Expected Lift	Traffic Needed	Priority
Offer/Pricing	10-50%	Medium	Highest
Headline/Subject line	5-30%	Low	High
CTA (text, color, placement)	5-20%	Low	High
Page layout	5-15%	Medium	Medium
Image/Video	3-15%	Medium	Medium
Form fields	5-25% (reduction = higher CVR)	Low	Medium
Social proof placement	3-10%	Medium	Lower

Test Design

Hypothesis: "Changing [variable] from [A] to [B] will increase [metric] by [X%] because [reasoning]"
Primary metric: ONE metric that determines winner (conversion rate, revenue per visitor, signup rate)
Guardrail metrics: Metrics that must NOT degrade (bounce rate, page load time, revenue per user)
Traffic split: 50/50 between control and variant (standard)
Sample size: Calculate before starting (see stat-ab-testing for formula)
Duration: Minimum 1-2 full business weeks (capture day-of-week effects)

Common Marketing Tests

Test	Control (A)	Variant (B)	Metric
Email subject	"Your weekly update"	"3 trends you missed this week"	Open rate
Landing page CTA	"Sign Up"	"Start Free Trial"	Click rate
Pricing page	Show 3 plans	Show 2 plans + "most popular" badge	Conversion rate
Ad creative	Product photo	Lifestyle photo with product	CTR → conversion
Form length	8 fields	4 fields	Form completion rate

Analysis & Decision

Result	Decision	Action
B wins, p < 0.05, meaningful lift	Ship B	Deploy variant, start next test
B wins, p < 0.05, tiny lift (<1%)	Don't ship	Lift not worth the change risk
No significant difference	Keep A	A is the known quantity; test something else
B wins on primary but loses on guardrail	Investigate	May need to redesign variant

Output Format

# A/B Test Plan: {Test Name}

## Hypothesis
Changing {variable} from {A} to {B} will increase {metric} by {X%} because {reasoning}.

## Design
- Primary metric: {metric}
- Guardrail: {metric(s)}
- Split: 50/50
- Sample size: {N per variant}
- Duration: {days/weeks}

## Results
| Metric | Control | Variant | Diff | CI (95%) | Significant? |
|--------|---------|---------|------|----------|-------------|
| {primary} | {value} | {value} | {±%} | [{lower}, {upper}] | Y/N |

## Decision
{Ship / Don't ship / Extend} — {rationale}

Gotchas

Don't stop early because it "looks good": Peeking at results and stopping when you see significance inflates false positive rates to 30%+. Run to planned sample size.
Day-of-week effects: Monday visitors behave differently from Saturday visitors. Always run tests for at least 1-2 complete weeks.
Novelty effect: A new design may get a temporary lift from curiosity. Wait 2+ weeks to see if the effect sustains.
Winner's curse: The estimated lift from a test is often larger than the true lift due to statistical noise. Expect the actual impact after deployment to be smaller.
Don't test everything — test what matters: Running 20 small tests on button colors while ignoring the pricing page is misallocating effort. Test high-impact elements first.

Scripts

Script	Description	Usage
`scripts/ab_test.py`	Two-proportion z-test with effect size and sample-size planning	`python scripts/ab_test.py --help`

Run python scripts/ab_test.py --verify to execute built-in sanity tests.

References

For statistical methodology (sample size, p-values), see the stat-ab-testing skill
For multivariate testing design, see references/mvt-design.md

mkt-ab-testing

Marketing A/B Testing

Framework

What to Test (by Impact)

Test Design

Common Marketing Tests

Analysis & Decision

Output Format

Gotchas

Scripts

References