statistics

Bayesian Conversion Rate Estimation: how much can we trust limited data

In the article on the foundations of Bayesian statistics, we saw how Bayesian updating works through simulation: generate samples from the prior, simulate data, filter. An intuitive method, but one that runs into a practical limit as soon as data becomes even slightly numerous.
In this article we move to the elegant analytical solution that the Bayesian approach provides for one of the most common problems in marketing analysis: estimating a conversion rate with limited data.

The problem always starts the same way. A small e-commerce store has collected 23 conversions out of 412 sessions. The raw rate is 23/412 ≈ 5.6%. A seemingly precise number. But how much do we trust it? We could be looking at the true 3% or the true 9% — with that sample, we simply do not know. The point estimate “5.6%” says nothing about its own uncertainty.

What we will cover:

The Beta-Binomial model: why Beta is the natural distribution for a conversion rate
Non-informative prior: letting the data speak
Informative prior: using historical data without cheating
Today’s posterior is tomorrow’s prior
Try it yourself
Further reading

From single click to rate: the Beta-Binomial model

Each session is a binary event: the user converts or does not.
With \( n \) sessions and \( k \) conversions, the generative mechanism is binomial. The parameter we want to estimate — the true conversion rate \( \theta \) — is a proportion: a value between 0 and 1.

When the prior on \( \theta \) is a Beta distribution and the data are binomial, something very convenient happens: the posterior is also a Beta distribution. This is called a conjugate prior, and it means Bayesian updating reduces to a simple arithmetic operation on the parameters.

The updating rule is: if the prior is Beta(α, β), after observing \( k \) conversions out of \( n \) sessions the posterior is:

\( Beta(\alpha + k,\ \beta + (n – k)) \\ \)

In plain words: we add the observed conversions to α and the observed failures to β. The prior Beta(α, β) encodes in α the “conversions already seen” (or an equivalent belief) and in β the “non-conversions”. Each new observation updates both counters.

Non-informative prior: letting the data speak

The most neutral prior possible for a proportion is the uniform distribution on [0, 1], which corresponds to Beta(1, 1): all values of the rate are considered equally plausible before seeing any data.

Our case: 23 conversions out of 412 sessions (389 non-conversions).

We calculate the posterior in R:

# Observed data
conv <- 23; sess <- 412; nonconv <- sess - conv

# NON-INFORMATIVE prior: uniform Beta(1,1)
a0 <- 1; b0 <- 1
a1 <- a0 + conv; b1 <- b0 + nonconv      # posterior Beta(24, 390)

cat("Non-informative: mean =", round(a1/(a1+b1), 4), "\n")
cat("  95% CI =", round(qbeta(c(.025,.975), a1, b1), 4), "\n")

Output: mean = 0.058, 95% CI = [0.0376, 0.0824].

The posterior Beta(24, 390) has mean 5.8% and a 95% credible interval between 3.8% and 8.2%.
The Bayesian credible interval is not an abstract statistical exercise: it means directly that there is a 95% probability that the true conversion rate lies between 3.8% and 8.2%. Not a frequency over infinite repetitions — a direct probability statement on the parameter.

With 412 sessions, the uncertainty is still appreciable: almost 5 percentage points of width. The point estimate 5.6% was misleading in its precision.

A note of caution: the Bayesian credible interval and the frequentist confidence interval have similar numbers but profoundly different meanings. The frequentist 95% is a property of the procedure (“repeating the experiment 100 times, 95 intervals would contain the true parameter”); the Bayesian 95% is a direct statement about the parameter in the specific case we are analysing.

Informative prior: using historical data without cheating

The non-informative prior is the honest starting point when we know nothing. But often we do know something: years of campaigns, sector history, category benchmarks.
Our e-commerce has four seasons of history with an average conversion rate around 4%. How do we translate this knowledge into a prior?

The Beta(8, 192) distribution has mean exactly 8/(8+192) ≈ 4% and — because α+β = 200 — a concentration equivalent to “trusting” our data as much as 200 fictitious historical sessions. It is not an arbitrary number: it is a declared and verifiable choice.

We calculate the informative posterior in R:

# INFORMATIVE prior from history: mean ~4% -> Beta(8, 192)
a0i <- 8; b0i <- 192
a1i <- a0i + conv; b1i <- b0i + nonconv  # posterior Beta(31, 581)

cat("Informative: mean =", round(a1i/(a1i+b1i), 4), "\n")
cat("  95% CI =", round(qbeta(c(.025,.975), a1i, b1i), 4), "\n")

Output: mean = 0.0507, 95% CI = [0.0347, 0.0694].

The informative posterior Beta(31, 581) gives mean 5.1% and 95% credible interval between 3.5% and 6.9%.
Two things to notice. First: the mean drops slightly from 5.8% to 5.1% — the prior “pulls” the estimate toward the historical 4%. Second: the interval narrows (from 4.6 to 3.4 percentage points) — the historical data acts as additional information, so uncertainty decreases.

With limited data, the informative prior helps: it adds information where data alone is insufficient. With many data points — thousands of sessions — the prior gets overwhelmed by the data and the difference between informative and non-informative priors becomes negligible. This is a fundamental feature of Bayesian inference: the prior matters when data is scarce; data always wins in the end.

Today’s posterior is tomorrow’s prior

The most practical elegance of the Bayesian approach is sequential updating. After one month, new data arrives: 15 conversions from 300 additional sessions. We do not need to start over — the posterior we just calculated becomes the new prior.

We update in R:

# New data: 15 conversions from 300 additional sessions
conv2 <- 15; sess2 <- 300
a2 <- a1i + conv2; b2 <- b1i + (sess2 - conv2)   # Beta(46, 866)

cat("After update: mean =", round(a2/(a2+b2), 4), "\n")
cat("  95% CI =", round(qbeta(c(.025,.975), a2, b2), 4), "\n")

Output: mean = 0.0504, 95% CI = [0.0372, 0.0655].

Three stages compared:

Stage	Accumulated data	Mean	95% CI
Pure prior (before any data)	—	4.0%	[1.8%, 7.1%]
After first month	23/412	5.1%	[3.5%, 6.9%]
After second month	38/712	5.0%	[3.7%, 6.6%]

The credible interval has narrowed at each stage: 5.3 percentage points for the pure prior, 3.4 after the first month, 2.8 after the update. The mean remained stable: the new data confirms the previous estimate instead of shifting it, and uncertainty decreases as expected.
This is the real operational advantage: there is no need to wait for a “large enough” sample to accumulate before making any estimate. We start from an uncertain estimate and refine it progressively, with each new data point.

Try it yourself

A lead generation website has a historical conversion rate around 2%. After an optimisation campaign, 8 conversions are observed from 150 sessions.

1. Build an informative prior reflecting the historical 2%: try Beta(4, 196) — it has mean exactly 2%. 2. Calculate the posterior after 8 conversions from 150 sessions. 3. Calculate the 95% credible interval. 4. Now try a non-informative Beta(1, 1) prior: does the posterior change much? Why?

Hint: the formula is always the same — qbeta(c(.025, .975), a0 + conv, b0 + nonconv). The only thing that changes is the starting point.

What we have built so far is the estimation of a rate for a single variant. The next step is comparing two variants — a control page and a modified page — and calculating the Bayesian probability that one beats the other. That is exactly what we will do in the next article: Bayesian A/B testing.

The peeking problem: why sneaking a look at an A/B test inflates false positives

On 21 January 2015 Optimizely — one of the most widely used A/B testing platforms…

2 days ago

statistics

Regression to the Mean: the SEO Fix That Worked… by Accident

In the Israeli Air Force, Daniel Kahneman recounts, the flight instructors were sure of one…

3 days ago

statistics

A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)

Over the previous articles we have looked at how hypothesis testing works and how the…

6 days ago

statistics

An Introduction to Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used statistical technique for reducing the complexity of…

6 days ago

statistics

Correlation: Pearson, Spearman and Kendall (and Why It Isn’t Causation)

Anyone who looks at a website's data does it constantly, often without noticing: they spot…

6 days ago

statistics

Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)

We closed the article on the A/B test significance calculator with a promise. We said…

1 week ago

Bayesian Conversion Rate Estimation: how much can we trust limited data

From single click to rate: the Beta-Binomial model

Non-informative prior: letting the data speak

Informative prior: using historical data without cheating

Today’s posterior is tomorrow’s prior

Try it yourself

Further reading

Related Post

Recent Posts

The peeking problem: why sneaking a look at an A/B test inflates false positives

Regression to the Mean: the SEO Fix That Worked… by Accident

A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)

An Introduction to Principal Component Analysis (PCA)

Correlation: Pearson, Spearman and Kendall (and Why It Isn’t Causation)

Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)

Headline