Bayesian Statistics: How to Learn from Data, One Step at a Time

In previous articles, we’ve examined statistical inference from a precise and coherent perspective: formulate a hypothesis, collect data, calculate a p-value, construct a confidence interval. We’ve conducted hypothesis tests, compared variants with A/B testing, and seen with the Central Limit Theorem why all of this works even when data isn’t normal.

This approach—called frequentist—has a clear logic: the parameter we want to estimate is a fixed value (even if unknown), and we “chase” it with data. But there’s another way to think about uncertainty, one that allows us to update our beliefs as new data arrives. It’s called the Bayesian approach, and in this article we’ll build its foundations.

Let’s start with a concrete example. Imagine we’ve just launched an advertising campaign and we don’t know the true click rate. We have an initial opinion based on experience (“click rates usually fall between 0% and 20%”), and then data starts coming in. The Bayesian approach lets us combine our initial opinion with the observed data to get an updated estimate—and repeat this process every time new information arrives.

Continue reading “Bayesian Statistics: How to Learn from Data, One Step at a Time”

How to Use Decision Trees to Classify Data

Decision Trees are a type of machine learning algorithm that uses a tree structure to divide data based on logical rules and predict the class of new data. They are easy to interpret and adaptable to different types of data, but can also suffer from problems such as overfitting, complexity, and imbalance.
Let’s understand a bit more about them and examine a simple example of use in R.

Continue reading “How to Use Decision Trees to Classify Data”

The Gradient Descent Algorithm Explained Clearly: From Intuition to Practice

A blindfolded person on a mountain

Imagine standing on a mountainous terrain, completely blindfolded. Your goal: reach the lowest point in the valley. You can’t see anything, but you can feel the slope of the ground beneath your feet. What do you do? You move in the direction where the ground goes down, one step at a time. If it slopes more steeply to the left, you go left. If it drops more to the right, you go right. With each step, you feel the slope again and redirect yourself.

This strategy, so simple and natural, is exactly what neural networks use to learn. Every time an AI model improves — learning to recognize a face, translate a sentence, or generate text — it does so by descending through a mathematical landscape, one step at a time, following the slope.

It’s called gradient descent, and it’s arguably the most important algorithm in modern machine learning.

Infographic: the blindfolded explorer metaphor for gradient descent, with three steps: Sensor, Action, Cycle
Continue reading “The Gradient Descent Algorithm Explained Clearly: From Intuition to Practice”

The Negative Binomial Distribution (or Pascal Distribution)

The negative binomial distribution describes the number of trials needed to achieve a certain number of successes in a series of independent trials. For example, it could be used to calculate the probability of getting three heads when flipping a coin 5 times, assuming the coin is balanced and therefore the probability of getting heads on each flip is 50%.

The negative binomial distribution is useful in many fields, including statistics, economics, biology, and physics. And also in “our” SEO.

Continue reading “The Negative Binomial Distribution (or Pascal Distribution)”