We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.
If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).
The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.
The hypergeometric distribution allows us to answer questions like:
If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?
Let’s start with the formula
I express my distribution in the form of a formula:
\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)The hypergeometric distribution explained with examples
We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?
I’ll immediately write down the data:
- N=30 (the total number of pieces in my batch)
- M=6 (the total malfunctioning pieces present in the batch)
- x=2 (I want to know the probability of finding 2 defective pieces)
- n=5 (the size of my sample)
Let’s see how to solve the same problem in R:
# Definition of the hypergeometric distribution parameters x <- 2 # I want to know the probability of finding 2 defective pieces n <- 5 # the size of my sample M <- 6 # the total malfunctioning pieces present in the batch N <- 30 # the total number of pieces in my batch # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
and I get the output:
[1] 0.2130437
Can an example with an urn and balls be missing?
Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:
- x=3 Number of white balls drawn
- n=4 Number of balls drawn
- M=5 Number of black balls
- N = 15 Total number of balls
We have seen that in R, it’s possible to use the dhyper
function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.
Here’s the R code:
# Definition of the hypergeometric distribution parameters x <- 3 # Number of white balls drawn n <- 4 # Number of balls drawn M <- 5 # Number of black balls N <- 15 # Total number of balls # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.