{"id":3269,"date":"2023-03-24T14:15:00","date_gmt":"2023-03-24T13:15:00","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3269"},"modified":"2024-12-08T12:02:24","modified_gmt":"2024-12-08T11:02:24","slug":"the-hypergeometric-distribution","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/the-hypergeometric-distribution\/","title":{"rendered":"The Hypergeometric Distribution"},"content":{"rendered":"\n<p>We have seen that the <a href=\"https:\/\/www.gironi.it\/blog\/distribuzioni-di-probabilita-distribuzioni-discrete-la-binomiale\/\" data-type=\"post\" data-id=\"807\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>binomial distribution<\/strong><\/a> is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population <strong>with replacement<\/strong>.<\/p>\n\n\n\n<p>If this does not occur, meaning if we are sampling from a population <strong>without replacement<\/strong>, we must use the <strong>hypergeometric distribution<\/strong>. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).<\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.<\/p>\n\n\n\n<p>The hypergeometric distribution allows us to answer questions like:<\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?<\/p>\n\n\n\n<!--more-->\n\n\n\t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-f5fe3cc3      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tWhat we will discuss\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#lets-start-with-the-formula\" class=\"uagb-toc-link__trigger\">Let&#039;s start with the formula<\/a><li class=\"uagb-toc__list\"><a href=\"#the-hypergeometric-distribution-explained-with-examples\" class=\"uagb-toc-link__trigger\">The hypergeometric distribution explained with examples<\/a><li class=\"uagb-toc__list\"><a href=\"#can-an-example-with-an-urn-and-balls-be-missing\" class=\"uagb-toc-link__trigger\">Can an example with an urn and balls be missing?<\/a><li class=\"uagb-toc__list\"><a href=\"#further-examination-of-the-hypergeometric-distribution\" class=\"uagb-toc-link__trigger\">Further Examination of the Hypergeometric Distribution<\/a><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\n\n\n<h2 class=\"wp-block-heading\">Let&#8217;s start with the formula<\/h2>\n\n\n\n<p>I express my distribution in the form of a formula:<\/p>\n\n\n\n\\(\nf(X|N,M,n)=\\frac{C^{N-M}_{n-x}\\times C^M_x}{C^N_n} \\\n\\)\n\n\n\n<h2 class=\"wp-block-heading\">The hypergeometric distribution explained with examples<\/h2>\n\n\n\n<p>We know that a batch of 30 pieces contains 6 malfunctioning pieces.<br>If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?<\/p>\n\n\n\n<p>I&#8217;ll immediately write down the data:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>N=30 (<em>the total number of pieces in my batch<\/em>)<\/li>\n\n\n\n<li>M=6 (<em>the total malfunctioning pieces present in the batch<\/em>)<\/li>\n\n\n\n<li>x=2 (<em>I want to know the probability of finding 2 defective pieces<\/em>)<\/li>\n\n\n\n<li>n=5 (<em>the size of my sample<\/em>)<\/li>\n<\/ul>\n\n\n\n<p>Let&#8217;s see how to solve the same problem in R:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Definition of the hypergeometric distribution parameters\nx &lt;- 2 # I want to know the probability of finding 2 defective pieces\nn &lt;- 5 # the size of my sample\nM &lt;- 6 # the total malfunctioning pieces present in the batch\nN &lt;- 30 # the total number of pieces in my batch\n\n# Probability calculation with the dhyper function\nprob &lt;- dhyper(x, M, N - M, n)\nprob<\/pre>\n\n\n\n<p>and I get the output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[1] 0.2130437<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Can an example with an urn and balls be missing?<\/h2>\n\n\n\n<div class=\"wp-block-uagb-image aligncenter uagb-block-eb7e4992 wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-center\"><figure class=\"wp-block-uagb-image__figure\"><img decoding=\"async\" srcset=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2023\/03\/07bed749-7708-4f32-8b92-d46342b9f532-300x300.jpeg \" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2023\/03\/07bed749-7708-4f32-8b92-d46342b9f532-300x300.jpeg\" alt=\"Hypergeometric distribution: drawing white or black balls from an urn.\" class=\"uag-image-2945\" width=\"300\" height=\"300\" title=\"\" loading=\"lazy\"\/><\/figure><\/div>\n\n\n\n<p>Let&#8217;s now make another example: let&#8217;s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>x=3 Number of white balls drawn<\/li>\n\n\n\n<li>n=4 Number of balls drawn<\/li>\n\n\n\n<li>M=5 Number of black balls<\/li>\n\n\n\n<li>N = 15 Total number of balls<\/li>\n<\/ul>\n\n\n\n<p>We have seen that in R, it&#8217;s possible to use the <code>dhyper<\/code> function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.<\/p>\n\n\n\n<p>Here&#8217;s the R code:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Definition of the hypergeometric distribution parameters\nx &lt;- 3 # Number of white balls drawn\nn &lt;- 4 # Number of balls drawn\nM &lt;- 5 # Number of black balls\nN &lt;- 15 # Total number of balls\n\n# Probability calculation with the dhyper function\nprob &lt;- dhyper(x, M, N - M, n)\nprob<\/pre>\n\n\n\n<p>The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Further Examination of the Hypergeometric Distribution<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/it.wikipedia.org\/wiki\/Distribuzione_ipergeometrica\" target=\"_blank\" rel=\"noreferrer noopener\">Hypergeometric Distribution &#8211; Wikipedia<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.webtutordimatematica.it\/materie\/statistica-e-probabilita\/distribuzioni-di-probabilita-discrete\/distribuzione-ipergeometrica\" target=\"_blank\" rel=\"noreferrer noopener\">Hypergeometric Distribution &#8211; WebTutorDiMatematica.it<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.okpedia.it\/distribuzione-ipergeometrica\" target=\"_blank\" rel=\"noreferrer noopener\">Hypergeometric Distribution &#8211; Okpedia<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement. If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-hypergeometric-distribution\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;The Hypergeometric Distribution&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[645],"tags":[1198,1200],"class_list":["post-3269","post","type-post","status-publish","format-standard","hentry","category-probability","tag-distribution","tag-hypergeometric"],"lang":"en","translations":{"en":3269,"it":2933},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"paolo","author_link":"https:\/\/www.gironi.it\/blog\/author\/paolo\/"},"uagb_comment_info":35,"uagb_excerpt":"We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement. If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality,&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3269","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3269"}],"version-history":[{"count":2,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3269\/revisions"}],"predecessor-version":[{"id":3336,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3269\/revisions\/3336"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}