{"id":3470,"date":"2026-03-01T20:31:50","date_gmt":"2026-03-01T19:31:50","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3470"},"modified":"2026-03-02T09:32:30","modified_gmt":"2026-03-02T08:32:30","slug":"hypothesis-testing-a-step-by-step-guide","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/hypothesis-testing-a-step-by-step-guide\/","title":{"rendered":"Hypothesis Testing: A Step-by-Step Guide"},"content":{"rendered":"<p>In everyday life, we often have to make decisions based on incomplete information.<\/p>\n<p>We may need to decide, for instance, whether a certain educational procedure is more effective than another, whether a new drug has genuinely positive effects on the course of a disease, and so on.<\/p>\n<p><strong>Hypothesis testing<\/strong> is a statistical procedure that allows us to pose a question on the basis of sample information, in order to reach a statistically significant decision.<\/p>\n<p style=\"background-color:#f0f0f0;padding:1em;\">In clearer and more direct terms: is my experimental finding due to chance? <strong>Hypothesis testing is precisely a statistical procedure for verifying whether chance is a plausible explanation of an experimental result.<\/strong><\/p>\n<p><!--more--><\/p>\n<div style=\"border: 1px solid #ccc;padding: 1.2em 1.5em;margin: 1.5em 0;border-radius: 6px\">\n<h3 style=\"margin-top: 0\">What We&#8217;ll Cover<\/h3>\n<ul>\n<li><a href=\"#a-premise\">A Premise: Probability vs Inference<\/a><\/li>\n<li><a href=\"#statistical-hypotheses\">Statistical Hypotheses<\/a><\/li>\n<li><a href=\"#type-i-type-ii\">Type I and Type II Errors<\/a><\/li>\n<li><a href=\"#one-or-two-tails\">One or Two Tails?<\/a><\/li>\n<li><a href=\"#step-by-step\">Step-by-Step Summary<\/a><\/li>\n<li><a href=\"#example\">A Worked Example<\/a><\/li>\n<li><a href=\"#r-function\">Writing a Z-Test Function in R<\/a><\/li>\n<li><a href=\"#type-ii-probability\">The Probability of a Type II Error<\/a><\/li>\n<li><a href=\"#power\">Statistical Power<\/a><\/li>\n<li><a href=\"#sample-size\">Determining the Required Sample Size<\/a><\/li>\n<li><a href=\"#unknown-sigma\">What If We Don&#8217;t Know the Population Parameters?<\/a><\/li>\n<li><a href=\"#further-reading\">Further Reading<\/a><\/li>\n<\/ul>\n<\/div>\n<h2 id=\"a-premise\">A Premise&#8230;<\/h2>\n<hr \/>\n<p>We need to understand the difference between <strong>probability<\/strong> and <strong>inference<\/strong>.<\/p>\n<ul>\n<li>If we <strong>know the population parameters<\/strong> and want to know the probability of obtaining a particular result, we are in the realm of <strong><em>probability<\/em><\/strong>.<\/li>\n<li>If from a <strong>sample<\/strong> we try to infer the population values, we are in the territory of <strong><em>inference<\/em><\/strong>.<\/li>\n<\/ul>\n<h2 id=\"statistical-hypotheses\">Statistical Hypotheses<\/h2>\n<p>In hypothesis testing we always have two hypotheses to &#8220;weigh up.&#8221; The <em>status quo<\/em> is called the <strong>null hypothesis<\/strong> and is denoted H<sub>0<\/sub>.<\/p>\n<p>What we do is test the null hypothesis against an <strong>alternative hypothesis<\/strong>, denoted H<sub>a<\/sub>.<\/p>\n<p style=\"background-color:#f0f0f0;padding:1em;\"><em>N.B. In general, the alternative hypothesis is the one we believe in!<\/em><\/p>\n<p>We then choose a <strong>significance level<\/strong> or <strong>alpha level<\/strong>, &alpha;. The common standard is &alpha; = 0.05, i.e. a <strong>95% significance level<\/strong>. Based on the alpha level we can determine one or more <strong>critical regions<\/strong>.<\/p>\n<p><strong>If the value we obtain from our test falls in a critical region, we reject the null hypothesis in favour of the alternative hypothesis.<\/strong><\/p>\n<p>A simple graphical example. Suppose we set up a test where the alternative hypothesis is that the mean is greater than the null hypothesis mean. This is a case with a single critical region, to the right of the &alpha; value. To reject the null hypothesis, our test value must fall in the shaded area:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2019\/01\/regione-critica.gif\" alt=\"Critical region in a one-tailed hypothesis test\" \/><\/figure>\n<h2 id=\"type-i-type-ii\">Type I and Type II Errors<\/h2>\n<p>The result we reach, of course, does not constitute a certainty.<\/p>\n<p>The significance level of the test (in our first example, 95%) tells us the probability of committing a <strong>Type I error<\/strong>\u2014that is, of <strong>erroneously rejecting the null hypothesis<\/strong>, which was true, and accepting the alternative hypothesis.<\/p>\n<p style=\"background-color:#f0f0f0;padding:1em;\">As we can see, we can determine the significance level of our test\u2014that is, we can set the maximum probability with which we accept the risk of a Type I error.<\/p>\n<p>If instead we <strong>accept the null hypothesis as valid when it should have been rejected because it was false<\/strong>, we commit a <strong>Type II error<\/strong>.<\/p>\n<p>The clearest way I have found to explain the concept is this:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2024\/03\/typeItypeIIerrors-1024x412.png\" alt=\"Type I and Type II errors explained visually\" \/><\/figure>\n<p>Calculating the probability of a Type II error is not as straightforward as for a Type I error, and we will <a href=\"#type-ii-probability\">address it in a somewhat simplified manner further on<\/a>.<\/p>\n<h2 id=\"one-or-two-tails\">One or Two Tails? That Is the Question&#8230;<\/h2>\n<p>The test can be <strong>one-tailed<\/strong>, for example if the alternative hypothesis is that a mean is greater than the null hypothesis mean:<\/p>\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/11\/una-coda.gif\" alt=\"One-tailed hypothesis test - the critical region\" \/><figcaption>The critical region \u2014 one-tailed test<\/figcaption><\/figure>\n<p>Or it can be <strong>two-tailed<\/strong> (if the alternative hypothesis is that the mean I hypothesise is different from the null hypothesis).<\/p>\n<p>In a two-tailed test we will have 2 critical regions at the two extremes of the curve, each representing a level of &alpha;\/2:<\/p>\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/11\/due-code.gif\" alt=\"Two-tailed hypothesis test - critical regions at 95% significance level\" \/><figcaption>The critical regions. Two-tailed test with a 95% significance level<\/figcaption><\/figure>\n<h2 id=\"step-by-step\">Step-by-Step Summary<\/h2>\n<ol>\n<li>State the null hypothesis and the alternative hypothesis.<\/li>\n<li>Set the significance level (alpha level).<\/li>\n<li>Which distribution to use: normal or t?<\/li>\n<li>Collect and analyse the data.<\/li>\n<li>Draw conclusions.<\/li>\n<\/ol>\n<p>We must ask ourselves a fundamental question: <strong>which distribution should we use?<\/strong><\/p>\n<p>The answer can be found by <strong>looking at sigma<\/strong> (the <a href=\"https:\/\/www.gironi.it\/blog\/en\/descriptive-statistics-measures-of-variability-or-dispersion\/#standard-deviation\">standard deviation<\/a>) and the sample size. We ask ourselves:<\/p>\n<p><strong>Do we know the population sigma?<\/strong> (in practice, a rather rare case&#8230;) Do we have a sufficiently large sample (n &gt; 30)?<\/p>\n<p>If the answer is <strong>YES<\/strong>, then we use the <strong>normal distribution<\/strong> (and compute the Z-score).<\/p>\n<p>If the answer is <strong>NO<\/strong>\u2014that is, if we do not know the population sigma (or if we are working with small samples)\u2014then we use the <strong>t distribution<\/strong> or <strong>Student&#8217;s distribution<\/strong>.<\/p>\n<p><em>N.B. As the sample grows larger, the t distribution approximates the normal more and more closely&#8230;<\/em><\/p>\n<h2 id=\"example\">A Worked Example<\/h2>\n<p>We want to conduct a hypothesis test in a situation where we know the population sigma. Let us follow our steps.<\/p>\n<h4>1 \u2014 State the null and alternative hypotheses<\/h4>\n<p>If:<\/p>\n\\(<br \/>\nH_{0}: \\mu = x \\\\<br \/>\nH_{a}: \\mu \\neq x \\\\<br \/>\n\\)\n<p>then we have a two-tailed test. We will have two critical regions to consider.<\/p>\n<p>If instead:<\/p>\n\\(<br \/>\nH_{0}: \\mu = x \\\\<br \/>\nH_{a}: \\mu > x \\\\<br \/>\n\\)\n<p>then the test is one-tailed.<\/p>\n<h4>2 \u2014 Set the significance level (alpha level)<\/h4>\n<p>Let us choose the most typical case, a 95% significance level, so:<\/p>\n\\(<br \/>\n\\alpha = 0.05 \\\\<br \/>\n\\)\n<h4>3 and 4 \u2014 Choose the distribution and Collect and analyse the data<\/h4>\n<p>Suppose we have collected the data. We now ask which distribution we should use for our test. The question is always the same: do we know the population sigma?<\/p>\n<p>In our example, let us say yes\u2014so we use the normal distribution.<\/p>\n<p>We compute the standard error and the Z-score:<\/p>\n\\(<br \/>\n\\sigma_{\\bar{x}}= \\frac{\\sigma}{\\sqrt{n}} \\\\<br \/>\n\\)\n<p>Now we can find Z:<\/p>\n\\(<br \/>\nZ = \\frac{\\bar{x} &#8211; \\mu}{\\sigma_{\\bar{x}}} \\\\<br \/>\n\\)\n<h4>5 \u2014 (Finally) Draw conclusions<\/h4>\n<p>Suppose the test is:<\/p>\n\\(<br \/>\nH_{0}: \\mu = x \\\\<br \/>\nH_{a}: \\mu \\neq x \\\\<br \/>\n\\)\n<p>So it is two-tailed. The chosen significance level is 95%, so we look up 2.5% (5%\/2) on the table, and find the value 1.96.<\/p>\n<p><em>N.B. We could have used R with the function:<\/em><\/p>\n<pre><code class=\"language-r\">qnorm(0.025)<\/code><\/pre>\n<p>Whichever tool we use, the value we find will be (rounded) 1.96.<\/p>\n<p><strong>Therefore -1.96 and +1.96 are the critical values.<\/strong><\/p>\n<p>If our Z-score turns out to be, say, 2.50, we immediately notice that the value falls within the critical region. <strong>We can then reject the null hypothesis and accept the alternative.<\/strong><\/p>\n<p style=\"background-color:#f0f0f0;padding:1em;\">Two quick tips:<br \/>1) <strong>Always draw the graph.<\/strong> It will help enormously in avoiding mistakes.<br \/>2) The most commonly used significance levels are <strong>5%<\/strong> and <strong>1%<\/strong>. The critical values for one-tailed and two-tailed tests that we will encounter most often are:<\/p>\n<table>\n<thead>\n<tr>\n<th>Significance level<\/th>\n<th>One-tailed<\/th>\n<th>Two-tailed<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>5%<\/strong> (alpha 0.05)<\/td>\n<td>1.65 (+ or -)<\/td>\n<td>&plusmn; 1.96<\/td>\n<\/tr>\n<tr>\n<td><strong>1%<\/strong> (alpha 0.01)<\/td>\n<td>2.33 (+ or -)<\/td>\n<td>&plusmn; 2.58<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2 id=\"r-function\">Making Life Easier: Writing a Function in R<\/h2>\n<p>Let us simplify things and prepare a function in R, which we will call z.test:<\/p>\n<pre><code class=\"language-r\">z.test <- function(x, mu, popvar) {\n  one.tail.p <- NULL\n  z.score <- round((mean(x) - mu) \/ (popvar \/ sqrt(length(x))), 3)\n  one.tail.p <- round(pnorm(abs(z.score), lower.tail = FALSE), 3)\n  cat(\"z =\", z.score, \"\\n\",\n      \"one-tailed probability =\", one.tail.p, \"\\n\",\n      \"two-tailed probability =\", 2 * one.tail.p)\n}<\/code><\/pre>\n<h2 id=\"type-ii-probability\">The Probability of a Type II Error<\/h2>\n<p>As we have seen, the probability of committing a Type I error is set in advance in our test by choosing the significance level, alpha.<\/p>\n<p>Let us suppose, for example, that a certain measurement related to a hypothesised mean value has as its null hypothesis a value equal to or greater than 260. Our alternative hypothesis is therefore that this mean value is less than 260. We further establish that a value of 240 or less would constitute an important deviation. In our example, the significance level is set at 95% (alpha = 0.05), the sample consists of 36 observations, and the standard deviation is 43.<\/p>\n\\(<br \/>\n\\bar{X}_{critical}=\\mu_0 + z\\sigma_{\\bar{x}}= 260+(-1.65)(7.17)=248.17 \\\\<br \/>\n\\)\n<p>where:<\/p>\n\\(<br \/>\n\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}=\\frac{43}{\\sqrt{36}}=\\frac{43}{6}=7.17 \\\\ \\\\<br \/>\n\\)\n<p>As we have repeated several times, the probability of a Type I error equals the significance level, thus 0.05 (5%).<\/p>\n<p>The probability of a Type II error is the probability that the sample mean is &ge; 248.17.<\/p>\n<p>If we conduct our measurement and find a mean of 240:<\/p>\n\\(<br \/>\nz=\\frac{\\bar{X}_{critical}-\\mu_1}{\\sigma_{\\bar{x}}}= \\frac{248.17-240}{7.17}=\\frac{8.17}{7.17}=1.14 \\\\ \\\\<br \/>\n\\)\n<p>Therefore: P(Type II error) = P(z &ge; 1.14) = 0.1271, that is, approximately 0.13, or 13%.<\/p>\n<p>If we keep the significance level and sample size constant but move the specific alternative mean value further from the null hypothesis value, then the probability of a Type II error decreases; conversely, the probability increases if the alternative value is set closer to the null hypothesis value.<\/p>\n<h2 id=\"power\">Statistical Power<\/h2>\n<p>In hypothesis testing, the notion of <strong>power<\/strong> refers to the probability of rejecting the null hypothesis, given a specific alternative value of the parameter (in our example, the population mean).<\/p>\n<p>Denoting the probability of a Type II error by &beta;, the power of the test is always 1 - &beta;.<\/p>\n<p>A graph constructed to represent the various power levels given the various alternative mean values is called a <strong>power curve<\/strong>.<\/p>\n<p>Returning to our example, with the alternative mean value of 240:<\/p>\n<p><strong>&beta;<\/strong> = P<sub>(Type II error)<\/sub> = 0.13<br \/>\n<strong>Power<\/strong> = 1 - &beta; = 1 - 0.13 = 0.87<br \/>\nThis is the probability of correctly rejecting the null hypothesis when &mu; = 240.<\/p>\n<h2 id=\"sample-size\">Determining the Required Sample Size for a Mean Test<\/h2>\n<p>Before drawing a sample, we can determine the required sample size by specifying:<\/p>\n<ol>\n<li>The hypothesised value of the mean<\/li>\n<li>The alternative value of the mean, such that its difference from the null hypothesis value is considered important<\/li>\n<li>The significance level of the test<\/li>\n<li>The accepted probability of a Type II error<\/li>\n<li>The population standard deviation, sigma<\/li>\n<\/ol>\n<p>The formula is:<\/p>\n\\(<br \/>\nn=\\frac{(z_0-z_1)^2\\sigma^2}{(\\mu_1-\\mu_0)^2} \\\\ \\\\<br \/>\n\\)\n<p>In our example, we set as acceptable levels: Type I error: 0.05; Type II error: 0.10; sigma = 43.<\/p>\n\\(<br \/>\nn=\\frac{(z_0-z_1)^2\\sigma^2}{(\\mu_1-\\mu_0)^2}= \\frac{(-1.65-1.28)^2(43)^2}{(240-260)^2}= \\frac{8.5849 \\times 1849}{400}= 39.68 \\approx 40 \\\\<br \/>\n\\)\n<p><strong>The value we were looking for is (approximately) 40.<\/strong><\/p>\n<h2 id=\"unknown-sigma\">At the End of All This... What If We Don't Know the Population Parameters?<\/h2>\n<p>If we do not know the population sigma, or if we are working with small samples (fewer than 30 values), we use the <strong>t distribution<\/strong> or <strong>Student's distribution<\/strong>. But that will be the subject of the next article...<\/p>\n<hr \/>\n<h3>You might also like<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.gironi.it\/blog\/en\/guide-to-statistical-tests-for-a-b-analysis\/\">Guide to Statistical Tests for A\/B Analysis<\/a><\/li>\n<li><a href=\"https:\/\/www.gironi.it\/blog\/en\/confidence-intervals-what-they-are-how-to-calculate-them-and-what-they-do-not-mean\/\">Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)<\/a><\/li>\n<li><a href=\"https:\/\/www.gironi.it\/blog\/en\/the-chi-square-test\/\">The Chi-Square Test<\/a><\/li>\n<\/ul>\n<hr \/>\n<h3 id=\"further-reading\">Further Reading<\/h3>\n<p>For a thorough treatment of hypothesis testing, confidence intervals, and the logic of statistical inference, <a href=\"https:\/\/www.amazon.it\/dp\/8891910651?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Statistica<\/em><\/a> by Newbold, Carlson and Thorne is one of the most complete references available. Its step-by-step approach to hypothesis testing\u2014from formulating hypotheses through to interpreting p-values\u2014makes it an invaluable companion for anyone working with inferential statistics.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In everyday life, we often have to make decisions based on incomplete information. We may need to decide, for instance, whether a certain educational procedure is more effective than another, whether a new drug has genuinely positive effects on the course of a disease, and so on. Hypothesis testing is a statistical procedure that allows &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/hypothesis-testing-a-step-by-step-guide\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;Hypothesis Testing: A Step-by-Step Guide&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161],"tags":[],"class_list":["post-3470","post","type-post","status-publish","format-standard","hentry","category-statistics"],"lang":"en","translations":{"en":3470,"it":1190},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"autore-articoli","author_link":"https:\/\/www.gironi.it\/blog\/author\/autore-articoli\/"},"uagb_comment_info":2,"uagb_excerpt":"In everyday life, we often have to make decisions based on incomplete information. We may need to decide, for instance, whether a certain educational procedure is more effective than another, whether a new drug has genuinely positive effects on the course of a disease, and so on. Hypothesis testing is a statistical procedure that allows&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3470","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3470"}],"version-history":[{"count":1,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3470\/revisions"}],"predecessor-version":[{"id":3475,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3470\/revisions\/3475"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3470"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3470"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}