mkt-ab-testing
Installation
SKILL.md
Marketing A/B Testing
Framework
IRON LAW: One Variable at a Time
If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.
What to Test (by Impact)
| Element | Expected Lift | Traffic Needed | Priority |
|---|---|---|---|
| Offer/Pricing | 10-50% | Medium | Highest |
| Headline/Subject line | 5-30% | Low | High |
| CTA (text, color, placement) | 5-20% | Low | High |
| Page layout | 5-15% | Medium | Medium |
| Image/Video | 3-15% | Medium | Medium |
| Form fields | 5-25% (reduction = higher CVR) | Low | Medium |
| Social proof placement | 3-10% | Medium | Lower |
Test Design
- Hypothesis: "Changing [variable] from [A] to [B] will increase [metric] by [X%] because [reasoning]"
- Primary metric: ONE metric that determines winner (conversion rate, revenue per visitor, signup rate)
- Guardrail metrics: Metrics that must NOT degrade (bounce rate, page load time, revenue per user)
- Traffic split: 50/50 between control and variant (standard)
- Sample size: Calculate before starting (see stat-ab-testing for formula)
- Duration: Minimum 1-2 full business weeks (capture day-of-week effects)
Common Marketing Tests
| Test | Control (A) | Variant (B) | Metric |
|---|---|---|---|
| Email subject | "Your weekly update" | "3 trends you missed this week" | Open rate |
| Landing page CTA | "Sign Up" | "Start Free Trial" | Click rate |
| Pricing page | Show 3 plans | Show 2 plans + "most popular" badge | Conversion rate |
| Ad creative | Product photo | Lifestyle photo with product | CTR → conversion |
| Form length | 8 fields | 4 fields | Form completion rate |
Analysis & Decision
| Result | Decision | Action |
|---|---|---|
| B wins, p < 0.05, meaningful lift | Ship B | Deploy variant, start next test |
| B wins, p < 0.05, tiny lift (<1%) | Don't ship | Lift not worth the change risk |
| No significant difference | Keep A | A is the known quantity; test something else |
| B wins on primary but loses on guardrail | Investigate | May need to redesign variant |
Output Format
# A/B Test Plan: {Test Name}
## Hypothesis
Changing {variable} from {A} to {B} will increase {metric} by {X%} because {reasoning}.
## Design
- Primary metric: {metric}
- Guardrail: {metric(s)}
- Split: 50/50
- Sample size: {N per variant}
- Duration: {days/weeks}
## Results
| Metric | Control | Variant | Diff | CI (95%) | Significant? |
|--------|---------|---------|------|----------|-------------|
| {primary} | {value} | {value} | {±%} | [{lower}, {upper}] | Y/N |
## Decision
{Ship / Don't ship / Extend} — {rationale}
Gotchas
- Don't stop early because it "looks good": Peeking at results and stopping when you see significance inflates false positive rates to 30%+. Run to planned sample size.
- Day-of-week effects: Monday visitors behave differently from Saturday visitors. Always run tests for at least 1-2 complete weeks.
- Novelty effect: A new design may get a temporary lift from curiosity. Wait 2+ weeks to see if the effect sustains.
- Winner's curse: The estimated lift from a test is often larger than the true lift due to statistical noise. Expect the actual impact after deployment to be smaller.
- Don't test everything — test what matters: Running 20 small tests on button colors while ignoring the pricing page is misallocating effort. Test high-impact elements first.
Scripts
| Script | Description | Usage |
|---|---|---|
scripts/ab_test.py |
Two-proportion z-test with effect size and sample-size planning | python scripts/ab_test.py --help |
Run python scripts/ab_test.py --verify to execute built-in sanity tests.
References
- For statistical methodology (sample size, p-values), see the stat-ab-testing skill
- For multivariate testing design, see
references/mvt-design.md
Weekly Installs
14
Repository
asgard-ai-platf…m/skillsGitHub Stars
125
First Seen
6 days ago
Security Audits