In everyday life, as in web analytics, we often have to make decisions based on incomplete information. How much data do I need to understand if this modification to the landing page worked? Are a thousand visits enough? Are ten thousand too many?
We can almost never measure the entire population (for example, all future visitors to a site). We have to work on a sample. And here lies the delicate balance: a sample that is too small leads to wrong conclusions, while one that is unnecessarily large wastes time and resources. So the question becomes: how much data do we really need?
Before figuring out how much data we need, we must understand how to collect it. The three main methods are:
The intuition is straightforward: the smaller the effect we are looking for (or the more variable the data), the more observations we need to distinguish it from background noise. Sounds hard to formalise? It is more linear than it seems.
To calculate the exact number, we need three ingredients:
The formula to estimate a proportion (like the Conversion Rate) is:
n = (Z² × p(1 – p)) / E²
Let’s run a quick example. We want to estimate the Conversion Rate of a new page with a margin of error of 1% (0.01) and a confidence level of 95% (Z = 1.96). To stay on the safe side, we set p = 0.5.
The examples below are in both R and Python — pick whichever language feels more familiar.
Let’s calculate it in R:
# Sample size calculation for a proportion
Z <- 1.96
p <- 0.5
E <- 0.01
n <- (Z^2 * p * (1-p)) / E^2
print(paste("Required size:", round(n)))
# Output: Required size: 9604 Let’s verify it in Python:
# Sample size calculation for a proportion
Z = 1.96
p = 0.5
E = 0.01
n = (Z**2 * p * (1-p)) / E**2
print(f"Required size: {round(n)}")
# Output: Required size: 9604 As we can see, around 9,604 users are needed to reach that precision. N.B.: if we accepted a margin of error of 2% (E=0.02), the number would collapse to about 2,401. That is the effect of E squared in the denominator: halving the precision requirement divides the required sample by four. Worth keeping in mind whenever we decide which margin to accept.
The formula seen so far estimates a single proportion. But in everyday CRO (Conversion Rate Optimization) work the actual problem is almost always a different one: comparing two proportions, as in an A/B test.
In that case the logic is the same, but the formula gets more complex because two new concepts come into play: the Effect Size (the minimum difference we want to detect) and the Statistical Power.
To skip the manual calculation, I built an interactive A/B test sample size calculator: it does the dirty work and also indicates how many days the test should run, given the page’s average traffic.
One point worth keeping firmly in mind before closing. Sampling error (the one the formula handles) is inevitable and shrinks as the data grow. But there is a far more insidious enemy, and no formula captures it: bias.
If we test a page only during the weekend, we might collect a million visits (sampling error practically zero), but the sample will not be representative of weekday users. So: no formula can save a sample that is biased at the source. A thousand observations gathered well beat a million gathered badly.
A product page receives roughly 10,000 impressions per month on Google, with an observed CTR of 3.5%. We want to estimate the true CTR with a margin of error of 1 percentage point (E = 0.01) and 95% confidence.
Hint: in R, a minimal function is enough — sample_size <- function(Z, p, E) ceiling((Z^2 * p * (1-p)) / E^2) — to be called twice with the two values of p.
Now we know how to collect an adequate sample and how much data we need. One question remains: how do we use that sample to rigorously compare two versions of the same page? This is where actual A/B testing comes in, and it is the next step of the path.
To dig deeper into sampling, the biases that can distort it, and the logic of statistical inference, The Art of Statistics by David Spiegelhalter is the most suitable companion. Spiegelhalter devotes illuminating pages to real cases — flawed polls, convenience samples, misleading figures — showing how the mathematics of sampling means little without careful thought on how the data are collected.
What is the Monte Carlo method The story of the Monte Carlo method begins in…
Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary…
One of the most common questions when planning an A/B test is: how many users…
Introduction Machine Learning is changing the way we see the world around us. From weather…
The Gini coefficient is a measure of the degree of inequality in a distribution, and…
Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are…