ab-testing-framework
A/B Testing Framework
Design, run, and analyze conversion experiments with statistical rigor.
Install
git clone https://github.com/thatrebeccarae/claude-marketing.git && cp -r claude-marketing/skills/ab-testing-framework ~/.claude/skills/
Test Design Process
Step 1: Hypothesis
Template: If we [change X], then [metric Y] will [increase/decrease] by [Z%] because [reason].
Good hypothesis: "If we change the CTA from Get Started to Start Free Trial, then signup rate will increase by 15% because it reduces uncertainty about cost."
Bad hypothesis: "If we change the button color, conversions will improve." (No reasoning, no expected magnitude.)
Step 2: Sample Size Calculation
To determine how long to run a test:
Required sample per variation = 16 * (p * (1-p)) / (MDE^2)
Where:
p = baseline conversion rate (as decimal)
MDE = minimum detectable effect (as decimal)
| Baseline Rate | 10% MDE | 20% MDE | 30% MDE |
|---|---|---|---|
| 1% | 253,414 | 63,354 | 28,157 |
| 3% | 82,369 | 20,592 | 9,152 |
| 5% | 48,640 | 12,160 | 5,404 |
| 10% | 23,040 | 5,760 | 2,560 |
| 20% | 10,240 | 2,560 | 1,138 |
Minimum test duration: 2 full business weeks (to capture day-of-week effects), even if sample size is reached sooner.
Step 3: Test Execution Rules
- Random assignment — visitors must be randomly assigned to control/variant
- No peeking — do not check results before reaching sample size
- No mid-test changes — do not modify variants during the test
- Even traffic split — 50/50 for A/B, even splits for multivariate
- Single variable — change only one thing per test (unless multivariate)
- Full duration — run for the pre-calculated duration, not until significance
Step 4: Statistical Analysis
Frequentist Approach
Z-test for proportions:
Z = (p1 - p2) / sqrt(p_pooled * (1 - p_pooled) * (1/n1 + 1/n2))
Where:
p1, p2 = conversion rates of control and variant
p_pooled = (x1 + x2) / (n1 + n2)
n1, n2 = sample sizes
p-value interpretation:
- p < 0.05: Statistically significant (95% confidence)
- p < 0.01: Highly significant (99% confidence)
- p >= 0.05: Not significant — do not declare a winner
Bayesian Approach
When to use Bayesian:
- Low traffic (small sample sizes)
- Need to make decisions faster
- Want probability of each variant being best (not just "significant or not")
Interpretation: "There is a 94% probability that Variant B is better than Control" vs frequentist "We reject the null hypothesis at 95% confidence."
Step 5: Decision Framework
| Result | Significance | Action |
|---|---|---|
| Variant wins | p < 0.05 | Implement variant |
| Control wins | p < 0.05 | Keep control, learn from failure |
| No difference | p >= 0.05 | Keep control, test something bigger |
| Variant wins | p = 0.05-0.10 | Consider traffic — may need more time |
Common Testing Pitfalls
- Peeking — checking results early inflates false positive rate from 5% to 26%+
- Stopping early — reaching significance != reaching required sample size
- Testing too many variants — each variant needs full sample size
- Ignoring segments — overall winner may be loser for key segments
- Too small an effect — testing for 2% lift needs enormous sample sizes
- Not accounting for seasonality — run full weeks, avoid holidays
- Multiple metrics — primary metric must be pre-declared; secondary are directional
- Survivorship bias — only measuring users who complete, not those who abandon
- Simpson paradox — segment-level winners can reverse at aggregate level
- Novelty effect — new designs get temporary lift; re-test after 2-4 weeks
What to Test (Prioritized by Impact)
High Impact
- Value proposition / headline
- CTA text and placement
- Pricing and offer structure
- Form length (fields removed)
- Page layout (single column vs multi)
- Social proof presence and placement
Medium Impact
- Image/video vs static
- Testimonial format (text vs video)
- Navigation presence on landing pages
- Trust badges and security signals
- Urgency elements (countdown, stock)
Low Impact (Usually Not Worth Testing)
- Button color (unless extreme contrast issue)
- Font changes
- Minor copy tweaks
- Icon styles
- Footer content
Integration with Other Skills
- cro-auditor — CRO audit generates test hypotheses; this skill designs the experiments
- google-analytics — GA4 for experiment data and segment analysis
- copywriting-frameworks — Generate variant copy using proven frameworks
More from thatrebeccarae/claude-marketing
facebook-ads
Meta Ads (Facebook & Instagram) platform expertise. Audit campaigns, audiences, creative strategy, pixel tracking, and CAPI. Use when the user asks about Facebook Ads, Instagram Ads, Meta Ads, social media advertising, Advantage+ campaigns, or Meta pixel/CAPI setup.
13content-creator
Comprehensive content marketing toolkit with brand voice analysis, SEO optimization scripts, content frameworks, social media strategy, and content calendar planning. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or developing content strategy. For deep SEO writing optimization, see the seo-content-writer skill.
13icp-research
Build detailed ideal customer profiles with pain points, objections, buying triggers, and messaging angles. Includes community research to find where ICPs gather online and extract their exact language. Use when researching audiences, creating buyer personas, or developing targeted messaging.
12content-pipeline
End-to-end content creation workflow that orchestrates research, editorial review, and social distribution agents in sequence. Use when the user wants to create, review, and distribute content through a multi-stage pipeline, or says "/content-pipeline".
12content-workflow
End-to-end content creation pipeline from research through editorial review to social distribution. Orchestrates a 3-stage workflow: research, draft/edit, and distribute. Supports blog posts, LinkedIn, Twitter threads, newsletters, and essays.
12brand-voice-guidelines
Develop brand voice, tone matrices, messaging frameworks, and brand book documentation. Use when the user asks about brand voice, tone of voice, brand guidelines, messaging framework, or brand consistency.
12