A random variable (also called a stochastic variable) is a variable that can take on different values depending on some random phenomenon. In many statistics textbooks it is simply abbreviated as r.v. It is a numerical value.
When probability values are assigned to all the possible numerical values of a random variable x, the result is a probability distribution.
In even simpler terms: a random variable is a variable whose values are each associated with a probability of being observed. The set of all possible values of a random variable and their associated probabilities is called a probability distribution. The sum of all probabilities is 1.
There are two main types of random variables: discrete and continuous.
Depending on the case, we deal with various types of distributions. These are the most common:
| Discrete distributions | Continuous distributions |
|---|---|
|
|
Consider a trial in which we are only interested in verifying whether a certain event has occurred or not. The random variable generated by such a trial will take the value 1 if the event has occurred, 0 otherwise. This r.v. is called a Bernoulli random variable.
Any dichotomous trial can be represented by a Bernoulli random variable.
A bit of notation. We denote a Bernoulli r.v. as follows:
\( x \sim Bernoulli(\pi) \\ \)Its mean is:
\( E(x)=\pi \\ \)And its variance is:
\( V(x)=\pi(1-\pi) \\ \)All trials that produce only 2 possible outcomes generate Bernoulli random variables (for example, tossing a coin). Starting from this simple assumption, it is a very short step to the Binomial Distribution.
Rather than dwelling on the conceptual aspects—important as they are, and for which I refer to specialised texts—what I want to do here is show in practice, and as clearly as possible, what we are talking about. Let us start with a definition and then look at the characteristics and a few practical examples.
The Binomial random variable can be understood as a sum of Bernoulli random variables.
What does this mean? Simply that if we repeat the success–failure dichotomy of the Bernoulli random variable n times under the same conditions, the result will be a sequence of n independent sub-trials, each of which can be associated with a Bernoulli random variable.
What are the characteristics of the binomial distribution?
If even one of these characteristics is absent, the binomial distribution does not apply.
From a practical standpoint, the binomial distribution allows us to calculate the probability of obtaining r successes in n independent trials.
The probability of a certain number of successes, r, depends on r itself, on the number of trials n, and on the individual probability, which we denote by p.
The probability of r successes in n trials is given by:
\( \frac{n!}{r!(n-r)!} \times p^r (1-p)^{n-r} \\ \)Looks difficult? It really is not (and in practice it turns out to be useful and even fun!).
First, let us recall that the symbol ! in mathematics denotes the factorial. As you will certainly remember, the factorial of 3, i.e. 3!, is: 3 × 2 × 1 = 6; the factorial of 4, i.e. 4!, is: 4 × 3 × 2 × 1 = 24; and so on (it will not escape notice that the factorial grows very, very quickly as the number increases…).
The factorial of a natural number is the product of that number by all its predecessors.
With that said, let us first see how to find the mean—the centre of our distribution—and the variance. This way, we will have everything we need for a few practical examples.
Let us call x the expected value. We can write our problem as follows:
\( x \sim Binomial(n, p) \\ \)The mean is:
\( E(x) = n \times p \\ \)The variance is:
\( Var(x) = n \times p \times (1 – p) \\ \)At this point, an example is in order. Let us calculate the variance of a distribution with size n = 10 and individual probability p = 0.5 (i.e. 50%). For instance, this could represent ten coin tosses.
\( x \sim Binomial(10, 0.5) \\ \)So the variance will be:
\( Var(x) = 10 \times 0.5 \times (1 – 0.5) = 2.5 \\ \)And the mean, naturally, will be:
\( E(x) = 10 \times 0.5 = 5 \\ \)Side note: it is intuitive that if p = 1 – p = 0.5, the probability distribution will be symmetric. If p < 0.5, it will be right-skewed, and if p > 0.5, it will be left-skewed.
Let us now introduce the concept of probability density, which is what we will use most often in real-world applications. This is when, for example, we want to know the probability that exactly two out of ten coin tosses come up heads.
To explain this more clearly, let us take a problem from a textbook:
If I cross a black mouse with a white one, there is a 3/4 probability that the offspring will be black and 1/4 that it will be white. What is the probability that out of 7 offspring, exactly 3 are white?
Let us write down the data straight away:
And now? Shall we do the calculations by hand? Why not:
\( \frac{n!}{r!(n-r)!} \times p^r (1-p)^{n-r} \\ \\ \)therefore:
\( \frac{7!}{3!4!} \times 0.25^{3} \times 0.75^{4} = 35 \times 0.0049439 = 0.173 \\ \)That is, 17.3%.
Doing calculations by hand is fun, but we are lazy and have R at our disposal. In R, the probability density is computed by a simple function:
dbinom()
The problem is therefore solved with the simple instruction:
dbinom(3, 7, 0.25)
# [1] 0.173 which gives us 0.173, so the answer is 17.3%.
There are equally interesting questions that call upon other discrete distributions:
As we can see, this is a vast and fascinating topic, which we will explore (lightly) across several articles. In the next one, we will look at another important distribution: the beta distribution, which plays a central role in Bayesian statistics.
For an accessible yet thorough introduction to probability distributions and the reasoning behind them, Finalmente ho capito la statistica by Maurizio De Pra covers discrete distributions—including the binomial—in a clear and approachable style, ideal for building solid intuition before moving on to more advanced topics.
Introduction Machine Learning is changing the way we see the world around us. From weather…
The Gini coefficient is a measure of the degree of inequality in a distribution, and…
Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are…
The Poisson distribution is a discrete probability distribution that describes the number of events occurring…
After looking at the most famous discrete distribution, the Binomial, as well as the Poisson…
The need I feel—the fruit of many years working in this field—is to affirm the…