We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.
If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).
The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.
The hypergeometric distribution allows us to answer questions like:
If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?
I express my distribution in the form of a formula:
\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?
I’ll immediately write down the data:
Let’s see how to solve the same problem in R:
# Definition of the hypergeometric distribution parameters x <- 2 # I want to know the probability of finding 2 defective pieces n <- 5 # the size of my sample M <- 6 # the total malfunctioning pieces present in the batch N <- 30 # the total number of pieces in my batch # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
and I get the output:
[1] 0.2130437
Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:
We have seen that in R, it’s possible to use the dhyper function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.
Here’s the R code:
# Definition of the hypergeometric distribution parameters x <- 3 # Number of white balls drawn n <- 4 # Number of balls drawn M <- 5 # Number of black balls N <- 15 # Total number of balls # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.
What is the Monte Carlo method The story of the Monte Carlo method begins in…
Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary…
One of the most common questions when planning an A/B test is: how many users…
Introduction Machine Learning is changing the way we see the world around us. From weather…
The Gini coefficient is a measure of the degree of inequality in a distribution, and…
Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are…