The Beta distribution is a crucial probability distribution in Bayesian statistics.
In theoretical probability problems, we know the exact probability value of a single event, making it relatively straightforward to apply basic probability calculation rules to reach the desired result.
In real life, however, it’s much more common to deal with collections of observations, and it’s from this data that we must derive probability estimates.
To put it more clearly: in life, we almost never have access to the exact probability value of an event: rather, we have data and observations.
Deriving probabilities from observed data is what we call statistical inference.
Beta is a continuous value distribution, and in this respect, it differs from the binomial distribution, which, as we’ve seen, presents discrete values.
We define it through a probability density function (PDF): (no, not the well-known format created by Adobe…)
\( Beta(p;\alpha,\beta)=\frac{p^{\alpha-1} \times (1-p)^{\beta-1}}{beta(\alpha;\beta)} \ \)where
p = is the probability of an event
α = how many times we observe our event of interest
β = how many times our event of interest does NOT occur
and obviously:
α + β = number of trials
The beta function (not the β value) in the denominator serves to normalize the result (which will thus be between 0 and 1).
It is calculated through numerical integration, since the distribution is continuous.
The Beta distribution is a probability distribution of probabilities, and since it models a probability, its domain is limited between 0 and 1.
Imagine that an online game organizer claims that at least 1 in 10 players wins a prize. We have the data, and we know that among the last 800 players, there were 65 winners.
The question we ask ourselves is: is the game organizer telling the truth based on our data? Can we consider that a player has at least a 10% chance of winning a prize when buying a ticket based on our sample?
The solution to our question can be easily derived using the beta function with our data:
We use the cumulative beta distribution:
β (.1, 65, 735, TRUE)
In R, it takes just one line to find the part of our function that lies between 0.1 and 1, showing the probabilities above 10% of winning a prize when buying a ticket:
integrate(function(x) dbeta(x,65,735),0.1,1) 0.03170546 with absolute error < 2.3e-06
The answer is right before our eyes. The probability of having at least 10% success is just 3.17%. What the game organizer claims, in light of the data, is false.
What is the Monte Carlo method The story of the Monte Carlo method begins in…
Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary…
One of the most common questions when planning an A/B test is: how many users…
Introduction Machine Learning is changing the way we see the world around us. From weather…
The Gini coefficient is a measure of the degree of inequality in a distribution, and…
Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are…