ab-testing-ecommerce
A/B Testing for E-commerce
Overview
A/B testing (split testing) runs controlled experiments where a random subset of visitors sees a variant while the rest see the control. Statistical analysis then determines whether any difference is real or due to chance. Good testing disciplines — calculating required sample size before starting, running tests for at least two full weeks, and never stopping early — separate genuine insights from noise.
This skill guides you through running A/B tests on your specific platform, choosing the right tools, and interpreting results correctly.
When to Use This Skill
- When making product page, checkout, or pricing changes and wanting data-driven validation
- When migrating from a client-side A/B testing tool to server-side assignment for accuracy
- When needing statistical power calculations before starting an experiment
- When analyzing experiment results and determining when to ship or kill a variant
- When running a pricing test and needing to ensure consistent pricing per customer
- When wanting to understand what sample size is needed before a test is meaningful
Core Instructions
Step 1: Determine your platform and choose the right testing tool
| Platform | Recommended Tool | Why |
|---|---|---|
| Shopify | Google Optimize (free, sunsetting) → Convert.com or Intelligems | Intelligems is built specifically for Shopify and supports pricing tests with sticky assignment; Convert integrates via Shopify's theme |
| Shopify (pricing tests) | Intelligems | The only tool that does true server-side price testing on Shopify without flickering |
| WooCommerce | Nelio A/B Testing plugin or Google Optimize | Nelio integrates natively with WordPress/WooCommerce; tracks WooCommerce conversion events automatically |
| BigCommerce | Convert.com or VWO (via script injection) | Both integrate via the BigCommerce storefront script manager |
| Custom / Headless | LaunchDarkly (feature flags + experiments) or build with GrowthBook (open source) | Server-side assignment with no flickering; GrowthBook is free and self-hostable |
Step 2: Calculate required sample size before launching
Never launch a test without knowing how many visitors each variant needs. Running a test without a pre-determined stopping rule leads to peeking and false positives.
Use the free calculator at https://www.evanmiller.org/ab-testing/sample-size.html or follow this guide:
- Baseline conversion rate: Pull your current CVR from your platform analytics (last 30 days)
- Minimum detectable effect: The smallest lift you care about detecting (typically 0.3–1 percentage point)
- Statistical power: 80% is standard
- Significance level: 95% confidence (alpha = 0.05)
Example: A Shopify store with 2.5% CVR wanting to detect a 0.3pp lift needs approximately 8,600 sessions per variant. At 500 sessions/day, that is 17 days per variant minimum.
Write down the required sample size before the test starts. This is your mandatory stopping rule.
Step 3: Set up the experiment on your platform
Shopify
Option A: Theme-based tests with Convert.com
- Install Convert.com and add the tracking script via Online Store → Themes → Edit code → theme.liquid
- In Convert.com, go to Experiences → Create Experience → A/B Test
- Use the visual editor to create your variant (change button color, headline, layout)
- Set goals: Add to Cart or Purchase (Convert tracks Shopify purchase events automatically)
- Set traffic allocation (50/50 for most tests)
- Set the minimum sample size you calculated as the stopping condition
Option B: Pricing tests with Intelligems
- Install Intelligems from the Shopify App Store
- Go to Intelligems → Price Tests → New Test
- Select the product(s) to test and set variant prices
- Intelligems handles sticky assignment server-side — the same customer always sees the same price
- Set the test duration to your pre-calculated sample size
- Review results in Intelligems' dashboard: it shows revenue per visitor (not just CVR) as the primary metric
For Shopify checkout tests (Shopify Plus only):
- Use Checkout Extensibility or Shopify Functions to create checkout variants
- Shopify's built-in A/B testing via Checkout profiles is available on Plus
WooCommerce
Using Nelio A/B Testing (recommended)
- Install Nelio A/B Testing from the WordPress plugin directory
- Go to Nelio A/B Testing → Add New Test
- Choose the test type:
- Page Test: Test different landing page or product page variants
- WooCommerce Test: Test product pricing, descriptions, or images
- Headline Test: Test page titles or CTAs
- Set your goal to WooCommerce Order (conversion event)
- Nelio tracks statistical significance in real time — do not stop early just because significance is reached; wait for your pre-calculated sample size
- View results at Nelio A/B Testing → Results
Alternative: Google Optimize (free, requires Google Analytics 4)
- Create a Google Optimize account and link it to your GA4 property
- Add the Optimize container ID to your WordPress site via MonsterInsights plugin (simplest method) or manually in the
<head> - Create an A/B test in Optimize pointing to your WooCommerce product or checkout URLs
- Set objectives using GA4 events (e.g.,
purchase)
BigCommerce
- Go to Storefront → Script Manager → Create a Script
- Add your A/B testing tool script (Convert.com, VWO, or Optimizely) with placement Head and All pages
- In your testing tool, create an experiment targeting your BigCommerce product or category page URL
- Set the conversion goal to track the order confirmation page URL (
/order-confirmation) - BigCommerce also has built-in Multivariate Testing under Marketing → Banner Manager for banner-level tests (limited to visual banner content)
Custom / Headless
For headless storefronts, use server-side assignment to avoid flickering and to support pricing tests:
Using GrowthBook (open source, recommended)
- Install GrowthBook:
npm install @growthbook/growthbook - Initialize on the server side with your user ID for sticky assignment:
import { GrowthBook } from "@growthbook/growthbook";
const gb = new GrowthBook({
apiHost: "https://cdn.growthbook.io",
clientKey: process.env.GROWTHBOOK_CLIENT_KEY,
attributes: {
id: userId, // stable user ID for consistent assignment
loggedIn: !!customerId,
},
});
await gb.loadFeatures();
// Assign variant — deterministic for the same userId
const checkoutButtonVariant = gb.getFeatureValue("checkout-button-color", "blue");
- Track exposures and conversions back to GrowthBook:
gb.setTrackingCallback((experiment, result) => {
analytics.track("Experiment Viewed", {
experimentId: experiment.key,
variationId: result.key,
});
});
// On order completion:
analytics.track("Purchase", { revenue: order.total });
- View statistical results in the GrowthBook UI — it runs Bayesian or frequentist significance tests on your data
Step 4: Interpret results correctly
When reviewing results:
- Wait for the pre-calculated sample size — do not stop because it "looks significant"
- Check revenue per visitor, not just CVR — a checkout test might increase CVR but decrease AOV; measure both
- Run for at least 2 full weeks — day-of-week effects distort 7-day tests
- Look at guardrail metrics — even if your primary metric improved, check return rates and customer service ticket volume
| Metric | What to Check |
|---|---|
| Primary | Revenue per visitor (not CVR alone) |
| Guardrail | Return rate (variant should not increase returns) |
| Guardrail | Cart abandonment rate |
| Confidence | p < 0.05 AND minimum sample size reached |
Best Practices
- Calculate sample size before starting — running until it "looks significant" is p-hacking; use the pre-calculated size as your stopping rule
- Use server-side assignment for pricing tests — client-side tools create flickering and can show different prices on page reload, which is a legal and UX risk
- Never run more than 3–4 experiments on the same page simultaneously — interaction effects between experiments contaminate all results
- Exclude internal team traffic — add your office IP to an exclusion list in your testing tool to prevent internal browsing from polluting results
- Document the hypothesis before starting — write down what you expect to happen and why; post-hoc hypothesis generation leads to confirmation bias
- Run experiments for at least 2 full business weeks — account for day-of-week and weekend shopping pattern differences
Common Pitfalls
| Problem | Solution |
|---|---|
| Test ends early because it "looks significant" — then the lift disappears | Use pre-calculated sample size as a mandatory stopping rule; configure your testing tool to lock results until sample size is reached |
| Same user sees different variants on different sessions | Use server-side assignment keyed on a stable user ID (not session ID); Intelligems and GrowthBook handle this correctly by default |
| Checkout test shows lift in CVR but drop in AOV | Always measure revenue per visitor as your primary metric; CVR and AOV can move in opposite directions |
| Price flickering on Shopify pricing tests | Use Intelligems instead of client-side tools — it assigns prices server-side before the page renders |
| Novelty effect inflates variant results in the first week | Report results with and without the first 3 days of data; a large week-1 spike that fades is usually novelty |
Related Skills
- @conversion-rate-optimization
- @product-analytics
- @customer-analytics
- @sales-reporting-dashboard
- @attribution-modeling