The Bayesian Approach: a Complete Learning Path, from the Foundations to Machine Learning

Every time we measure something — a page’s conversion rate, the effectiveness of a variant, the intent behind a search — we can face uncertainty in two ways. There is the frequentist road, which asks the data for a blunt verdict and hands us a p-value; and there is the Bayesian road, which starts from a different question. The Bayesian approach moves from what we already believe, puts it to the proof against the data we observe, and draws from it an updated belief: not a yes/no, but a probability we can weigh — “what is the probability that this variant is genuinely better?”. It is a way of reasoning closer to how we actually decide under uncertainty, when the data are scarce and the stakes are concrete.

Learning the Bayesian approach, however, does not mean memorising Bayes’ theorem and setting it aside. It means walking a road that starts from understanding how a belief is updated in the light of the data, passes through estimating and comparing alternatives — how much a conversion rate is really worth, which of two variants to choose — and arrives at the most operational uses: optimising in real time where to send the traffic, and automatically classifying the intent of a search. It is the same logic, from the theoretical brick to the machine-learning application.

This page is that road, in order. We do not re-explain the theory here: each stage is an article on the blog, and the order in which we have arranged them is the order in which it makes sense to read them. Anyone starting from scratch can follow them in sequence; anyone with some grounding can jump to the group they need. The three sections that follow — the foundations, estimating and comparing, optimising and classifying — are the three movements of a single path. We start with the foundations.

The foundations

Before applying the Bayesian method we need to understand what it rests on. The two stages in this section answer the basic questions: what it means to update a belief with the data, and what the mathematical tool is that lets us represent the uncertainty about a proportion.
These are the bricks that hold up everything else: without them the applications that follow remain recipes to copy, not tools to understand.

Bayesian statistics: the foundations is the non-negotiable starting point. It explains the heart of the method — the prior, the data, the posterior — and shows how Bayes’ theorem is not an esoteric formula but the natural way to learn from experience one step at a time. It is the article to read first, because everything else on the path does nothing but apply this same idea to increasingly concrete problems.

The Beta distribution explained simply introduces the tool we will use in almost all the later applications. When what we want to estimate is a proportion — a conversion rate, a click-through percentage — the Beta is the distribution that describes its uncertainty, and it updates with the data in an elegant way. Understanding it here means already holding half the work for the articles that come afterwards.

Estimating and comparing

With the foundations in place, we step into the real work. This section tackles the two questions that recur every day in SEO and marketing: how much a conversion rate is really worth when the data are scarce, and which of two variants to choose.
Here the Bayesian advantage is plain to see: instead of a binary verdict we obtain a direct probability, the one we actually need in order to decide.

Bayesian conversion rate estimation is the first concrete application. It shows how to move from the raw number — “3 conversions out of 100” — to an honest estimate that accounts for how few the data are, returning not a blunt point but a credible interval. It is the Bayesian way of not being fooled by small numbers, the trap that catches anyone who reads a conversion rate without asking how much uncertainty it hides.

Bayesian A/B testing brings the same reasoning to the comparison between two variants. Instead of asking “is the difference significant?”, it answers the question that really matters: “what is the probability that B is better than A, and by how much?”. It is worth reading in dialogue with the classic version of the method — frequentist A/B testing, which remains the reference for controlled experiments — to grasp what changes, and what is gained, when we move from the p-value to the direct probability.

Optimising and classifying

The last section takes the Bayesian method where it becomes almost invisible: inside systems that decide and classify on their own. The two stages show the two most operational uses of the Bayesian idea — allocating traffic in real time among several alternatives, and assigning a label to a text.
It is the point where Bayesian statistics spills over into machine learning, without ever ceasing to be the same starting idea.

The multi-armed bandit and Thompson sampling is the natural evolution of the A/B test. Instead of waiting for the end of the experiment to choose the winning variant, the bandit shifts the traffic towards what works while the test is running, reducing the cost of keeping the losing alternatives alive. Thompson sampling is the Bayesian strategy that makes all this elegant: it samples from the posteriors and lets uncertainty guide the exploration.

Naive Bayes for search intent closes the path by applying Bayes’ theorem to a classification problem. It shows how a surprisingly simple model can assign a query its most probable intent — informational, transactional, navigational — starting from the words that make it up. It is the proof that the same logic of the foundations, scaled up to thousands of examples, becomes a machine-learning tool in its own right.

Where to start

If this is the first contact with the method, the entry point is only one: Bayesian statistics: the foundations and, right after it, the Beta distribution. They are the two stages from which everything else takes on meaning; tackle the applications without them and, sooner or later, we always come back here. Anyone arriving instead from a practical need — estimating a rate, comparing two pages — can start from the section they need and work back to the foundations when they feel the need to understand why it works.

This is one of the thematic paths we are building to navigate the blog’s articles. It sits alongside the one on inferential statistics, which walks the frequentist road: two different ways of answering the same question about uncertainty, and it is worth knowing both. The Bayesian approach, though, has a particular charm of its own — it is the closest to the way that, outside the textbooks, we update our ideas every time the world brings us a new piece of data.