How does A/B testing work?
- Define the hypothesis and the Overall Evaluation Criterion (OEC), e.g., checkout Conversion or ROI.
- Instrument events and goals (see Event and Goal), and tag traffic consistently with UTM parameters.
- Randomize users into A and B and hold all else equal.
- Size the sample (power & alpha), run the test, then analyze (frequentist p-value or Bayesian posterior) for statistical significance.
- Decide: ship, iterate, or archive. Validate results across segments (e.g., by Cohort or channel) and ensure your Attribution Model doesn’t mask the lift.
Minimal example
Variant | Users | Conversions | Conv. Rate |
---|---|---|---|
A (control) | 10,000 | 500 | 5.00% |
B (test) | 10,000 | 560 | 5.60% |
Uplift = (5.60% − 5.00%) / 5.00% = +12% (evaluate significance before rolling out). |
Good practice & gotchas
- One primary metric to avoid fishing; pre-register guardrails (latency, churn, etc.).
- No peeking: frequent looks inflate false positives—use sequential methods or a validated stats engine.
- A/A tests catch bucketing or instrumentation bugs.
- Variance reduction and stratification (e.g., CUPED, cohorts) increase sensitivity.
- Tooling: beyond GA4, platforms like Optimizely/VWO and privacy-first analytics (Plausible, Matomo, Simple Analytics) can run or analyze experiments.
SEO note: Also searched as “What is A/B testing?” and “How does A/B testing work?”