FREE TOOL · NO SIGNUP TO PREVIEW

A/B Test Sample Size Calculator

Find out exactly how many visitors your experiment needs. Enter your baseline rate and the lift you want to detect to get visitors per variant, total sample, and test duration — instantly.

Test parameters

%

Your control's current conversion rate.

%

Smallest relative lift you want to reliably detect.

Variants includes the control. Daily visitors (across all variants) is optional — add it for a duration estimate.

Example result— enter your own baseline rate & minimum detectable effect to size your test.
Required sample size per variant
31,234
62,468total visitors across 2 variants
Baseline
5%
Target
5.5%
Effect
10% rel
Estimated test duration
32 days
≈ Weeks
4.6
2,000 visitors/day
80%
Power
aim for ≥ 80%
95%
Confidence
a = 5%
1.96
z (alpha)
0.8416
z (power)
Two-proportion z-test Two-sided 80% power 95% confidence

To detect a move from 5% to 5.5% (10% relative lift) with 80% power at 95% confidence, each variant needs 31,234 visitors. At 2,000 visitors/day that's about 32 days (~4.6 weeks).

What to watch

  • Lock your sample size, power, and significance level BEFORE starting, and avoid "peeking" — stopping the moment it looks significant inflates false positives.
The Complete Guide

How to Calculate the Sample Size for an A/B Test

5 MIN READ

Understand with AI

Discuss with your preferred AI assistant

80%
Standard power

The conventional minimum statistical power — an 80% chance of detecting a real effect if one exists.

95%
Standard confidence

A 5% significance level (alpha) is the default tolerance for a false positive across the industry.

MDE vs sample

Halving the minimum detectable effect roughly quadruples the sample size you need per variant.

Running an A/B test without first calculating the sample size is the single most common way experiments go wrong. You either stop too early and ship a "winner" that was really just noise, or you run forever chasing an effect your traffic could never have detected. A sample size calculation tells you, before you start, exactly how many visitors each variant needs — so you know whether the test is even worth running.

This guide explains what sample size means in A/B testing, the four inputs that drive it, the exact formula behind the numbers, and how to use the result to plan a test you can actually trust.

What Is Sample Size in an A/B Test?

Sample size is the number of visitors (or sessions) each variant needs before your test has enough statistical power to reliably detect the effect you care about. It is not a number you discover after the fact by watching the dashboard — it is a target you commit to before launching, based on four parameters: your baseline conversion rate, the minimum detectable effect, the statistical power, and the significance level.

Get the sample size right and you avoid two expensive mistakes: a false positive (calling a winner that does not exist) and a false negative (missing a real improvement because the test was under-powered).

The Four Inputs That Drive Sample Size

Every sample size calculation — including the one above — rests on the same four levers:

  • Baseline conversion rate. The current conversion rate of your control. Lower baselines need dramatically more traffic, because there is less signal to measure against.
  • Minimum detectable effect (MDE). The smallest improvement you want to be able to detect. Expressed as a relative lift (e.g. "a 10% lift over baseline") or an absolute change (e.g. "+0.5 percentage points"). Smaller MDEs require much larger samples.
  • Statistical power (1 − β). The probability of detecting a real effect if one exists. 80% is the conventional minimum; 90% is safer for high-stakes decisions. Higher power means a bigger sample.
  • Significance level (α). Your tolerance for a false positive. 5% (95% confidence) is standard. A stricter 1% (99% confidence) reduces false positives but raises the sample needed.

How the Sample Size Formula Works

For a two-proportion test comparing a control rate (p1) with a target rate (p2), the required sample size per variant is:

SymbolMeaning
nVisitors required per variant
Critical z-value for your significance level and test type
Critical z-value for your chosen power
p1, p2Baseline and target conversion rates

In words: n = (zα·√(2·p̄·(1−p̄)) + zβ·√(p1·(1−p1) + p2·(1−p2)))² ÷ (p2 − p1)², where p̄ is the average of the two rates. The calculator solves this for you and turns the four inputs into the critical z-values using the inverse normal distribution. Because the denominator squares the effect, halving your MDE roughly quadruples the visitors you need.

Estimating Test Duration

Sample size only tells you how many visitors you need — duration tells you how long that will take. Divide the required per-variant sample by your daily visitors per variant:

  • Days = sample per variant ÷ (daily visitors ÷ number of variants).
  • Add the optional daily-visitors field in the calculator and it estimates days and weeks automatically.

Always run for at least one full week — ideally two — even if you hit the sample sooner, because weekday and weekend behaviour differ. If the estimate stretches past roughly eight weeks, seasonality and cookie churn start to distort results; raise your MDE, send more traffic, or test a higher-converting metric instead.

Common Sample Size Mistakes

  • Peeking and stopping early. Checking the dashboard daily and stopping the moment it looks significant inflates your false-positive rate far above the stated 5%.
  • Chasing tiny MDEs. Wanting to detect a 1% relative lift on a 2% baseline can require hundreds of thousands of visitors per arm. Be honest about the effect you can realistically achieve and measure.
  • Ignoring multiple comparisons. Testing A/B/C/D quadruples your false-positive risk unless you correct alpha — this calculator applies a Bonferroni adjustment automatically.
  • Forgetting low baselines. A 0.5% conversion rate needs vastly more traffic than a 10% rate to detect the same relative lift.

How to Use the Result

Once the calculator gives you a per-variant sample, do three things before launching: confirm the duration is realistic for your traffic, lock the sample size and stopping rule in your test plan, and resist the urge to peek. If the numbers are impossible — say, six months to reach significance — that is valuable information too: pick a bigger MDE, consolidate variants, or move the test higher in the funnel where conversion volume is greater.

Expert Tips

Pick the smallest lift worth shipping

Your minimum detectable effect should be the smallest improvement that would actually change your decision. Set it too small and the test becomes impossibly large; too large and you miss real wins.

Plan duration, not just sample

Divide the required sample by your daily traffic to get a realistic timeline. Run at least one full week to capture weekday/weekend behaviour, and avoid tests that stretch past ~8 weeks.

Frequently Asked Questions

How many visitors do I need for an A/B test?

It depends on your baseline conversion rate and the minimum detectable effect. A 5% baseline detecting a 10% relative lift at 80% power and 95% confidence needs roughly 31,000 visitors per variant. Lower baselines and smaller effects require far more; enter your own numbers above for an exact figure.

What is a good minimum detectable effect for A/B testing?

Choose the smallest lift that would still be worth shipping. Most teams target a 5–20% relative lift. Smaller MDEs are more sensitive but require dramatically more traffic, so balance ambition against the sample size and duration you can afford.

Should I use 80% or 90% power?

80% is the conventional minimum and a fine default. Use 90% (or higher) for high-stakes decisions where missing a real winner is costly — it lowers your false-negative rate at the cost of a larger sample.

Why does a lower baseline conversion rate need a bigger sample?

Rare events carry more relative variance, so it takes more observations to separate a true effect from random noise. Detecting the same relative lift on a 1% baseline can need ten times the traffic of a 10% baseline.

Related guides

Related tools