statistics

Probability Distributions: Discrete Distributions and the Binomial

A random variable (also called a stochastic variable) is a variable that can take on different values depending on some random phenomenon. In many statistics textbooks it is simply abbreviated as r.v. It is a numerical value.

When probability values are assigned to all the possible numerical values of a random variable x, the result is a probability distribution.

In even simpler terms: a random variable is a variable whose values are each associated with a probability of being observed. The set of all possible values of a random variable and their associated probabilities is called a probability distribution. The sum of all probabilities is 1.

What We’ll Cover

Discrete and Continuous Variables
The Bernoulli Random Variable
The Binomial Distribution
Mean, Expected Value, and Variance
An Example: Computing the Probability Density
Other Discrete Distributions
Further Reading

Discrete and Continuous Variables

There are two main types of random variables: discrete and continuous.

A discrete r.v. can take on a discrete (finite or countable) set of real numbers. That is, we could list all possible values in a table together with their respective probabilities. An example is the outcome of rolling a die: there are 6 possible outcomes, each with a probability of 1/6 (and the sum of all probabilities, of course, equals 1).
A continuous r.v. can take on all values within a real interval—that is, an infinite number of values within any given interval. The probability that X falls within a given interval is represented by the area under the probability distribution. In the case of a continuous random variable, probabilities are represented by means of a probability density function. The total area under the curve (i.e. the total probability) equals 1.

Depending on the case, we deal with various types of distributions. These are the most common:

Discrete distributions	Continuous distributions
Binomial Poisson Geometric	Normal Uniform Student’s t

Event Yes or Event No? The Bernoulli Random Variable

Consider a trial in which we are only interested in verifying whether a certain event has occurred or not. The random variable generated by such a trial will take the value 1 if the event has occurred, 0 otherwise. This r.v. is called a Bernoulli random variable.

Any dichotomous trial can be represented by a Bernoulli random variable.

This is Mr. Jakob Bernoulli. The details are on Wikipedia for those interested…

A bit of notation. We denote a Bernoulli r.v. as follows:

\( x \sim Bernoulli(\pi) \\ \)

Its mean is:

\( E(x)=\pi \\ \)

And its variance is:

\( V(x)=\pi(1-\pi) \\ \)

All trials that produce only 2 possible outcomes generate Bernoulli random variables (for example, tossing a coin). Starting from this simple assumption, it is a very short step to the Binomial Distribution.

The Binomial Distribution

Rather than dwelling on the conceptual aspects—important as they are, and for which I refer to specialised texts—what I want to do here is show in practice, and as clearly as possible, what we are talking about. Let us start with a definition and then look at the characteristics and a few practical examples.

The Binomial random variable can be understood as a sum of Bernoulli random variables.

What does this mean? Simply that if we repeat the success–failure dichotomy of the Bernoulli random variable n times under the same conditions, the result will be a sequence of n independent sub-trials, each of which can be associated with a Bernoulli random variable.

What are the characteristics of the binomial distribution?

There is a fixed number of trials (n).
Each trial has two possible outcomes: success or failure.
The probability of success (p) is the same for every trial.
The outcome of one trial does not affect any other (the trials are independent).

If even one of these characteristics is absent, the binomial distribution does not apply.

From a practical standpoint, the binomial distribution allows us to calculate the probability of obtaining r successes in n independent trials.

The probability of a certain number of successes, r, depends on r itself, on the number of trials n, and on the individual probability, which we denote by p.

The probability of r successes in n trials is given by:

\( \frac{n!}{r!(n-r)!} \times p^r (1-p)^{n-r} \\ \)

Looks difficult? It really is not (and in practice it turns out to be useful and even fun!).

NOTE: The part
\(
\frac{n!}{r!(n-r)!} \\
\)
is called the binomial coefficient, and is found in textbooks written as:
\(
{n\choose k} \\
\)

First, let us recall that the symbol ! in mathematics denotes the factorial. As you will certainly remember, the factorial of 3, i.e. 3!, is: 3 × 2 × 1 = 6; the factorial of 4, i.e. 4!, is: 4 × 3 × 2 × 1 = 24; and so on (it will not escape notice that the factorial grows very, very quickly as the number increases…).

The factorial of a natural number is the product of that number by all its predecessors.

With that said, let us first see how to find the mean—the centre of our distribution—and the variance. This way, we will have everything we need for a few practical examples.

Mean, Expected Value, and Variance of a Binomial Distribution

Let us call x the expected value. We can write our problem as follows:

\( x \sim Binomial(n, p) \\ \)

The mean is:

\( E(x) = n \times p \\ \)

The variance is:

\( Var(x) = n \times p \times (1 – p) \\ \)

At this point, an example is in order. Let us calculate the variance of a distribution with size n = 10 and individual probability p = 0.5 (i.e. 50%). For instance, this could represent ten coin tosses.

\( x \sim Binomial(10, 0.5) \\ \)

So the variance will be:

\( Var(x) = 10 \times 0.5 \times (1 – 0.5) = 2.5 \\ \)

And the mean, naturally, will be:

\( E(x) = 10 \times 0.5 = 5 \\ \)

Side note: it is intuitive that if p = 1 – p = 0.5, the probability distribution will be symmetric. If p < 0.5, it will be right-skewed, and if p > 0.5, it will be left-skewed.

An Example: Computing the Probability Density

Let us now introduce the concept of probability density, which is what we will use most often in real-world applications. This is when, for example, we want to know the probability that exactly two out of ten coin tosses come up heads.

To explain this more clearly, let us take a problem from a textbook:

If I cross a black mouse with a white one, there is a 3/4 probability that the offspring will be black and 1/4 that it will be white. What is the probability that out of 7 offspring, exactly 3 are white?

Let us write down the data straight away:

n = 7
r = 3
p = 1/4, i.e. 0.25

And now? Shall we do the calculations by hand? Why not:

\( \frac{n!}{r!(n-r)!} \times p^r (1-p)^{n-r} \\ \\ \)

therefore:

\( \frac{7!}{3!4!} \times 0.25^{3} \times 0.75^{4} = 35 \times 0.0049439 = 0.173 \\ \)

That is, 17.3%.

Doing calculations by hand is fun, but we are lazy and have R at our disposal. In R, the probability density is computed by a simple function:

dbinom()

The problem is therefore solved with the simple instruction:

dbinom(3, 7, 0.25)
# [1] 0.173

which gives us 0.173, so the answer is 17.3%.

Other Discrete Distributions

There are equally interesting questions that call upon other discrete distributions:

How many trials should we expect before obtaining a success? This is where the geometric distribution enters the scene.
How many times can we expect an event to occur (or not) in a given time interval? That calls for the Poisson distribution.
Are we sampling from a population without replacement? Then we use the hypergeometric distribution.

As we can see, this is a vast and fascinating topic, which we will explore (lightly) across several articles. In the next one, we will look at another important distribution: the beta distribution, which plays a central role in Bayesian statistics.