In the article on the foundations of Bayesian statistics, we saw how Bayesian updating works through simulation: generate samples from the prior, simulate data, filter. An intuitive method, but one that runs into a practical limit as soon as data becomes even slightly numerous.
In this article we move to the elegant analytical solution that the Bayesian approach provides for one of the most common problems in marketing analysis: estimating a conversion rate with limited data.
The problem always starts the same way. A small e-commerce store has collected 23 conversions out of 412 sessions. The raw rate is 23/412 ≈ 5.6%. A seemingly precise number. But how much do we trust it? We could be looking at the true 3% or the true 9% — with that sample, we simply do not know. The point estimate “5.6%” says nothing about its own uncertainty.
What we will cover:
Each session is a binary event: the user converts or does not.
With \( n \) sessions and \( k \) conversions, the generative mechanism is binomial. The parameter we want to estimate — the true conversion rate \( \theta \) — is a proportion: a value between 0 and 1.
When the prior on \( \theta \) is a Beta distribution and the data are binomial, something very convenient happens: the posterior is also a Beta distribution. This is called a conjugate prior, and it means Bayesian updating reduces to a simple arithmetic operation on the parameters.
The updating rule is: if the prior is Beta(α, β), after observing \( k \) conversions out of \( n \) sessions the posterior is:
\( Beta(\alpha + k,\ \beta + (n – k)) \\ \)In plain words: we add the observed conversions to α and the observed failures to β. The prior Beta(α, β) encodes in α the “conversions already seen” (or an equivalent belief) and in β the “non-conversions”. Each new observation updates both counters.
The most neutral prior possible for a proportion is the uniform distribution on [0, 1], which corresponds to Beta(1, 1): all values of the rate are considered equally plausible before seeing any data.
Our case: 23 conversions out of 412 sessions (389 non-conversions).
We calculate the posterior in R:
# Observed data
conv <- 23; sess <- 412; nonconv <- sess - conv
# NON-INFORMATIVE prior: uniform Beta(1,1)
a0 <- 1; b0 <- 1
a1 <- a0 + conv; b1 <- b0 + nonconv # posterior Beta(24, 390)
cat("Non-informative: mean =", round(a1/(a1+b1), 4), "\n")
cat(" 95% CI =", round(qbeta(c(.025,.975), a1, b1), 4), "\n") Output: mean = 0.058, 95% CI = [0.0376, 0.0824].
The posterior Beta(24, 390) has mean 5.8% and a 95% credible interval between 3.8% and 8.2%.
The Bayesian credible interval is not an abstract statistical exercise: it means directly that there is a 95% probability that the true conversion rate lies between 3.8% and 8.2%. Not a frequency over infinite repetitions — a direct probability statement on the parameter.
With 412 sessions, the uncertainty is still appreciable: almost 5 percentage points of width. The point estimate 5.6% was misleading in its precision.
A note of caution: the Bayesian credible interval and the frequentist confidence interval have similar numbers but profoundly different meanings. The frequentist 95% is a property of the procedure (“repeating the experiment 100 times, 95 intervals would contain the true parameter”); the Bayesian 95% is a direct statement about the parameter in the specific case we are analysing.
The non-informative prior is the honest starting point when we know nothing. But often we do know something: years of campaigns, sector history, category benchmarks.
Our e-commerce has four seasons of history with an average conversion rate around 4%. How do we translate this knowledge into a prior?
The Beta(8, 192) distribution has mean exactly 8/(8+192) ≈ 4% and — because α+β = 200 — a concentration equivalent to “trusting” our data as much as 200 fictitious historical sessions. It is not an arbitrary number: it is a declared and verifiable choice.
We calculate the informative posterior in R:
# INFORMATIVE prior from history: mean ~4% -> Beta(8, 192)
a0i <- 8; b0i <- 192
a1i <- a0i + conv; b1i <- b0i + nonconv # posterior Beta(31, 581)
cat("Informative: mean =", round(a1i/(a1i+b1i), 4), "\n")
cat(" 95% CI =", round(qbeta(c(.025,.975), a1i, b1i), 4), "\n") Output: mean = 0.0507, 95% CI = [0.0347, 0.0694].
The informative posterior Beta(31, 581) gives mean 5.1% and 95% credible interval between 3.5% and 6.9%.
Two things to notice. First: the mean drops slightly from 5.8% to 5.1% — the prior “pulls” the estimate toward the historical 4%. Second: the interval narrows (from 4.6 to 3.4 percentage points) — the historical data acts as additional information, so uncertainty decreases.
With limited data, the informative prior helps: it adds information where data alone is insufficient. With many data points — thousands of sessions — the prior gets overwhelmed by the data and the difference between informative and non-informative priors becomes negligible. This is a fundamental feature of Bayesian inference: the prior matters when data is scarce; data always wins in the end.
The most practical elegance of the Bayesian approach is sequential updating. After one month, new data arrives: 15 conversions from 300 additional sessions. We do not need to start over — the posterior we just calculated becomes the new prior.
We update in R:
# New data: 15 conversions from 300 additional sessions
conv2 <- 15; sess2 <- 300
a2 <- a1i + conv2; b2 <- b1i + (sess2 - conv2) # Beta(46, 866)
cat("After update: mean =", round(a2/(a2+b2), 4), "\n")
cat(" 95% CI =", round(qbeta(c(.025,.975), a2, b2), 4), "\n") Output: mean = 0.0504, 95% CI = [0.0372, 0.0655].
Three stages compared:
| Stage | Accumulated data | Mean | 95% CI |
|---|---|---|---|
| Pure prior (before any data) | — | 4.0% | [1.8%, 7.1%] |
| After first month | 23/412 | 5.1% | [3.5%, 6.9%] |
| After second month | 38/712 | 5.0% | [3.7%, 6.6%] |
The credible interval has narrowed at each stage: 5.3 percentage points for the pure prior, 3.4 after the first month, 2.8 after the update. The mean remained stable: the new data confirms the previous estimate instead of shifting it, and uncertainty decreases as expected.
This is the real operational advantage: there is no need to wait for a “large enough” sample to accumulate before making any estimate. We start from an uncertain estimate and refine it progressively, with each new data point.
A lead generation website has a historical conversion rate around 2%. After an optimisation campaign, 8 conversions are observed from 150 sessions.
1. Build an informative prior reflecting the historical 2%: try Beta(4, 196) — it has mean exactly 2%. 2. Calculate the posterior after 8 conversions from 150 sessions. 3. Calculate the 95% credible interval. 4. Now try a non-informative Beta(1, 1) prior: does the posterior change much? Why?
Hint: the formula is always the same — qbeta(c(.025, .975), a0 + conv, b0 + nonconv). The only thing that changes is the starting point.
What we have built so far is the estimation of a rate for a single variant. The next step is comparing two variants — a control page and a modified page — and calculating the Bayesian probability that one beats the other. That is exactly what we will do in the next article: Bayesian A/B testing.
If you want to understand how Bayesian reasoning enters practical decisions — from market forecasting to uncertainty estimation in real data — The Signal and the Noise by Nate Silver is the book I recommend. Silver devotes explicit chapters to Bayesian updating, showing it in concrete contexts (weather forecasting, politics, sports) that make the idea of “updating beliefs with new data” immediately intuitive. Rigorous but written like a story, it is a rare kind of book that leaves you thinking differently about uncertainty.
On 21 January 2015 Optimizely — one of the most widely used A/B testing platforms…
In the Israeli Air Force, Daniel Kahneman recounts, the flight instructors were sure of one…
Over the previous articles we have looked at how hypothesis testing works and how the…
Principal Component Analysis (PCA) is a widely used statistical technique for reducing the complexity of…
Anyone who looks at a website's data does it constantly, often without noticing: they spot…
We closed the article on the A/B test significance calculator with a promise. We said…