{"id":3830,"date":"2026-06-19T08:34:46","date_gmt":"2026-06-19T07:34:46","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/ab-testing-2\/"},"modified":"2026-06-19T08:35:08","modified_gmt":"2026-06-19T07:35:08","slug":"ab-testing-statistically-valid-experiments","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/ab-testing-statistically-valid-experiments\/","title":{"rendered":"A\/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)"},"content":{"rendered":"<p>Over the previous articles we have looked at how <a href=\"https:\/\/www.gironi.it\/blog\/en\/hypothesis-testing-a-step-by-step-guide\/\">hypothesis testing<\/a> works and how the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-two-sample-t-test-how-to-test-a-hypothesis-for-dependent-or-independent-samples\/\">two-sample t-test<\/a> lets us compare two groups rigorously. We have also built <a href=\"https:\/\/www.gironi.it\/blog\/en\/confidence-intervals-what-they-are-how-to-calculate-them-and-what-they-do-not-mean\/\">confidence intervals<\/a>, learned to quantify the uncertainty of our estimates, and seen with the <a href=\"https:\/\/www.gironi.it\/blog\/en\/central-limit-theorem\/\">Central Limit Theorem<\/a> why all this works even when the data are not normal.<\/p>\n<p>But there is one question that, in the day-to-day reality of anyone doing SEO and marketing, comes up almost daily: <strong>which variant performs better?<\/strong> Which title tag brings more clicks? Which landing page converts more? Which meta description draws attention? It is not an academic question: it is the question that separates data-driven decisions from opinions disguised as strategies.<\/p>\n<p>The good news is that we already have all the tools to answer it. <strong>A\/B testing<\/strong> is nothing more than the direct application of the statistical inference concepts we have built step by step: hypothesis testing, comparison between groups, significance. In this article we put it all together.<\/p>\n<p><!--more--><\/p>\n<div style=\"border: 1px solid #ccc; padding: 1.2em 1.5em; margin: 1.5em 0; border-radius: 6px;\">\n<h3 style=\"margin-top: 0;\">What we&#8217;ll cover<\/h3>\n<ul>\n<li><a href=\"#what-is-ab-test\">What an A\/B test is<\/a><\/li>\n<li><a href=\"#formulating-test\">Setting up an A\/B test correctly<\/a><\/li>\n<li><a href=\"#landing-example\">Worked example: conversion rate of two landing pages<\/a><\/li>\n<li><a href=\"#common-mistakes\">The most common mistakes<\/a><\/li>\n<li><a href=\"#frequentist-vs-bayesian\">Frequentist vs Bayesian approach<\/a><\/li>\n<li><a href=\"#seo-example\">Practical SEO example: meta description A\/B test<\/a><\/li>\n<li><a href=\"#try-it-yourself\">Try it yourself<\/a><\/li>\n<\/ul>\n<\/div>\n<h2 id=\"what-is-ab-test\">What an A\/B test is<\/h2>\n<p>An A\/B test is, in essence, a <strong>controlled experiment<\/strong>: we take two variants of something (a page, a headline, a call-to-action), randomly assign users to one of the two variants, and measure which one produces better results.<\/p>\n<p>Variant <strong>A<\/strong> is the <strong>control<\/strong> (the current version, the one we are already using). Variant <strong>B<\/strong> is the <strong>treatment<\/strong> (the new version we want to test). The logic is the same as a scientific experiment: we change one variable at a time, keep everything else constant, and observe whether the change produces a measurable effect.<\/p>\n<p>Three elements make an A\/B test reliable. <strong>Randomisation<\/strong>: users are assigned to A or B at random. This is essential, because if we showed A in the morning and B in the afternoon, any observed difference might depend on the time of day, not on the variant. The <strong>control group<\/strong>: without A as a reference, we wouldn&#8217;t know whether B&#8217;s results are good or bad. And finally a <strong>success metric<\/strong> defined in advance: CTR, conversion rate, time on page. The metric must be chosen <em>before<\/em> collecting the data, not after (we will come back to this point shortly).<\/p>\n<p>But why do we need statistics? Because data are noisy. If variant A has a CTR of 5.0% and variant B of 5.3%, is that difference real or just random fluctuation? The naked eye cannot tell: we need a formal test. And it is precisely the <strong>two-sample test<\/strong> we have already seen \u2014 applied to proportions rather than means.<\/p>\n<h2 id=\"formulating-test\">Setting up an A\/B test correctly<\/h2>\n<p>Before collecting data, we have to set up the test rigorously. Let&#8217;s see how.<\/p>\n<p><strong>Choosing the metric.<\/strong> The metric must be clear, measurable and directly linked to the goal. For a title tag, the natural metric is the <strong>CTR<\/strong> (Click-Through Rate). For a landing page, the <strong>conversion rate<\/strong>. For a blog article, perhaps the <strong>average time on page<\/strong>. Always keep this in mind: a vague metric (&#8220;people like the page more&#8221;) is not a metric.<\/p>\n<p><strong>Defining the hypotheses.<\/strong> As in every statistical test, we start from a null hypothesis and an alternative hypothesis:<\/p>\n<ul>\n<li>\\(H_0\\): the two variants have the same effect (no difference between A and B)<\/li>\n<li>\\(H_1\\): the two variants have a different effect (a difference exists)<\/li>\n<\/ul>\n<p><strong>The statistical test.<\/strong> When we compare two proportions (such as two CTRs or two conversion rates), the appropriate test is the <strong>two-proportion z-test<\/strong>. The logic is the same as the two-sample t-test, but adapted to binary data (click\/no-click, conversion\/no-conversion).<\/p>\n<p>The test statistic is computed as follows. First, we compute the <strong>pooled proportion<\/strong>, which is our best estimate of the common proportion under the null hypothesis:<\/p>\n\\(<br \/>\n\\hat{p} = \\frac{x_1 + x_2}{n_1 + n_2} \\\\<br \/>\n\\)\n<p>where \\(x_1\\) and \\(x_2\\) are the successes (clicks, conversions) in the two groups, and \\(n_1\\) and \\(n_2\\) the sample sizes.<\/p>\n<p>Then we compute the z statistic:<\/p>\n\\(<br \/>\nz = \\frac{\\hat{p}_1 &#8211; \\hat{p}_2}{\\sqrt{\\hat{p}(1-\\hat{p})\\left(\\frac{1}{n_1} + \\frac{1}{n_2}\\right)}} \\\\<br \/>\n\\)\n<p>The numerator is the observed difference between the two proportions; the denominator is the standard error under the null hypothesis. The ratio tells us how many &#8220;standard-error units&#8221; separate the two proportions: the higher it is, the harder the difference is to attribute to chance.<\/p>\n<h3>Example: CTR of two title tags<\/h3>\n<p>Let&#8217;s take a concrete example. We tested two title tag variants for an important page on the site:<\/p>\n<ul>\n<li><strong>Title A<\/strong> (control): 1500 impressions, 75 clicks \u2192 CTR = 5.0%<\/li>\n<li><strong>Title B<\/strong> (treatment): 1500 impressions, 105 clicks \u2192 CTR = 7.0%<\/li>\n<\/ul>\n<p>Title B looks better, but is the difference statistically significant? Let&#8217;s compute it step by step.<\/p>\n<p><strong>Step 1<\/strong>: the pooled proportion:<\/p>\n\\(<br \/>\n\\hat{p} = \\frac{75 + 105}{1500 + 1500} = \\frac{180}{3000} = 0.06 \\\\<br \/>\n\\)\n<p><strong>Step 2<\/strong>: the standard error:<\/p>\n\\(<br \/>\nSE = \\sqrt{0.06 \\times 0.94 \\times \\left(\\frac{1}{1500} + \\frac{1}{1500}\\right)} = \\sqrt{0.0564 \\times 0.00133} \\approx 0.00867 \\\\<br \/>\n\\)\n<p><strong>Step 3<\/strong>: the z statistic:<\/p>\n\\(<br \/>\nz = \\frac{0.07 &#8211; 0.05}{0.00867} \\approx 2.31 \\\\<br \/>\n\\)\n<p><strong>Step 4<\/strong>: the p-value. For a two-tailed test, \\(p \\approx 0.021\\).<\/p>\n<p>So: the p-value is below 0.05. We can reject the null hypothesis and conclude that the difference between the two title tags is statistically significant. Title B has a significantly higher CTR.<\/p>\n<p>Let&#8217;s run the same test in R:<\/p>\n<pre><code class=\"language-r\"># Data\nn1 &lt;- 1500; x1 &lt;- 75    # Title A\nn2 &lt;- 1500; x2 &lt;- 105   # Title B\np1 &lt;- x1 \/ n1  # 0.05\np2 &lt;- x2 \/ n2  # 0.07\n\n# Pooled proportion and z-test\np_pool &lt;- (x1 + x2) \/ (n1 + n2)\nse &lt;- sqrt(p_pool * (1 - p_pool) * (1\/n1 + 1\/n2))\nz &lt;- (p2 - p1) \/ se\np_value &lt;- 2 * (1 - pnorm(abs(z)))\n\ncat(\"z =\", round(z, 3), \"\\n\")\ncat(\"p-value =\", round(p_value, 4), \"\\n\")<\/code><\/pre>\n<p>Result: z = 2.306, p-value = 0.0211.<\/p>\n<h2 id=\"landing-example\">Worked example: conversion rate of two landing pages<\/h2>\n<p>Let&#8217;s move on to a more elaborate example. An e-commerce store is testing two variants of its landing page:<\/p>\n<ul>\n<li><strong>Page A<\/strong> (current design): 1000 visitors, 35 conversions \u2192 conversion rate = 3.5%<\/li>\n<li><strong>Page B<\/strong> (new design): 1000 visitors, 58 conversions \u2192 conversion rate = 5.8%<\/li>\n<\/ul>\n<p>The difference looks substantial (2.3 percentage points), but with these numbers is it enough to rule out chance?<\/p>\n<p>Let&#8217;s check in R with <code>prop.test()<\/code>, which runs the two-proportion test:<\/p>\n<pre><code class=\"language-r\">result &lt;- prop.test(\n  x = c(35, 58),\n  n = c(1000, 1000)\n)\n\nprint(result)<\/code><\/pre>\n<p>The function returns the p-value of the test and, very usefully, the <strong>confidence interval of the difference<\/strong> between the two proportions. In this case the p-value is about 0.019 \u2014 below 0.05, so the difference is statistically significant.<\/p>\n<p>But it is the confidence interval of the difference that gives us the most valuable information: not only <em>whether<\/em> B is better than A, but <em>by how much<\/em>, and with what margin of uncertainty. If the CI of the difference runs from about 0.4 to 4.2 percentage points, we know that B is almost certainly better, and the improvement lies within that range. That is far richer information than a simple &#8220;yes, it&#8217;s significant&#8221;.<\/p>\n<p>n.b.: <code>prop.test()<\/code> applies a <strong>continuity correction<\/strong> (Yates&#8217;s correction) that makes the test slightly more conservative. For large samples the difference is negligible; for small samples, it is a welcome caution.<\/p>\n<h2 id=\"common-mistakes\">The most common mistakes<\/h2>\n<p>A\/B testing is a powerful tool, but a treacherous one. The ease with which a test can be set up hides serious methodological pitfalls. Let&#8217;s look at the most frequent ones.<\/p>\n<h3>Stopping the test too early<\/h3>\n<p>It is the strongest temptation: after a few days, B looks clearly better than A. Why wait any longer? Because those preliminary results are <strong>noise<\/strong>, not signal.<\/p>\n<p>The problem has a technical name: <strong>peeking<\/strong>. Every time we look at the interim data and decide whether to stop, we increase the probability of a false positive. It&#8217;s like tossing a coin: if we stop every time we get three heads in a row, we&#8217;ll conclude the coin is rigged. But it isn&#8217;t \u2014 we simply haven&#8217;t given it enough tosses.<\/p>\n<p><strong>How to avoid it<\/strong>: define the required sample size <em>beforehand<\/em> and wait until you reach that number before drawing conclusions. In the meantime, you can use our <a href=\"https:\/\/www.gironi.it\/blog\/en\/ab-test-sample-size-calculator\/\">sample size calculator<\/a> to determine how many users you need before launching the test.<\/p>\n<h3>Testing too many variants without correction<\/h3>\n<p>Another frequent mistake: testing three, four, five variants at the same time (A\/B\/C\/D&#8230;) and then comparing them all pairwise. The problem is that of <strong>multiple comparisons<\/strong>: the more comparisons we make, the more likely we are to find at least one significant result by pure chance.<\/p>\n<p>With 5 variants and 10 pairwise comparisons, the probability of finding at least one false positive rises from 5% to almost 40%. This is not a detail: it is an error that invalidates the entire test.<\/p>\n<p><strong>How to avoid it<\/strong>: if multiple comparisons are needed, apply a <strong>Bonferroni correction<\/strong> (divide the \\(\\alpha\\) threshold by the number of comparisons) or, better still, limit yourself to testing one variant at a time.<\/p>\n<h3>Ignoring the power of the test<\/h3>\n<p>We know the risk of a false positive well (type I error, \\(\\alpha\\)). But there is a mirror risk that is often ignored: the <strong>false negative<\/strong> (type II error, \\(\\beta\\)). It happens when B really is better than A, but our test fails to detect it.<\/p>\n<p>The most common cause? A <strong>sample that is too small<\/strong>. If we have only 100 visitors per variant, the test does not have enough &#8220;power&#8221; to detect small but real differences. We will conclude &#8220;no significant difference&#8221; not because the difference doesn&#8217;t exist, but because we didn&#8217;t have enough data to see it.<\/p>\n<p><strong>How to avoid it<\/strong>: compute the required sample size <em>before<\/em> launching the test, based on the minimum effect we want to detect. This is the subject of <strong>power analysis<\/strong>: use the <a href=\"https:\/\/www.gironi.it\/blog\/en\/ab-test-sample-size-calculator\/\">sample size calculator<\/a> to check whether your test has enough power.<\/p>\n<h3>Confusing statistical significance with practical significance<\/h3>\n<p>A low p-value does not automatically mean the result is <em>important<\/em>. With very large samples, even microscopic differences become statistically significant. If we test two variants on 500,000 visitors, a CTR difference of 0.01% (from 5.00% to 5.01%) might come out significant. But it is an operationally irrelevant difference.<\/p>\n<p><strong>Caution<\/strong>: the p-value answers the question &#8220;is the difference real?&#8221;, not the question &#8220;is the difference big enough to matter to us?&#8221;. For the latter we need a different measure \u2014 the <strong>effect size<\/strong> \u2014 which we cover in a dedicated article.<\/p>\n<h2 id=\"frequentist-vs-bayesian\">Frequentist vs Bayesian approach<\/h2>\n<p>Everything we have seen so far follows the <strong>frequentist<\/strong> approach: we compute a test statistic, compare it with a reference distribution, obtain a p-value and make a binary decision (reject or fail to reject \\(H_0\\)).<\/p>\n<p>It works, and works well. But it has limits that you feel in everyday practice. The p-value does not tell us &#8220;by how much B is better than A&#8221;. It does not tell us &#8220;what the probability is that B is genuinely superior&#8221;. And if we collect new data, we cannot simply update the result: we have to recompute everything from scratch.<\/p>\n<p>There is an alternative approach that answers directly the question we care about most: <strong>what is the probability that B is better than A?<\/strong> It is the <strong>Bayesian<\/strong> approach.<\/p>\n<p>The idea is this. Instead of starting from a null hypothesis and trying to reject it, we start from a <strong>prior distribution<\/strong> that represents our initial knowledge about each variant&#8217;s conversion. Then, as we collect data, we update that distribution. The result is a <strong>posterior distribution<\/strong> that incorporates both our prior knowledge and the observed data.<\/p>\n<p>For conversion rates, the natural distribution is the <strong>Beta<\/strong>: it is defined between 0 and 1 (like a proportion) and updates very elegantly. If we start from a prior \\(\\text{Beta}(\\alpha, \\beta)\\) and observe \\(s\\) successes out of \\(n\\) trials, the posterior is:<\/p>\n\\(<br \/>\n\\text{Beta}(\\alpha + s, \\, \\beta + n &#8211; s) \\\\<br \/>\n\\)\n<p>Sounds hard? It&#8217;s very easy. Let&#8217;s use the data from the two landing pages in the previous example. We start from a <strong>non-informative prior<\/strong> \\(\\text{Beta}(1, 1)\\) \u2014 which amounts to saying &#8220;we know nothing, any value between 0 and 1 is equally plausible&#8221;:<\/p>\n<ul>\n<li><strong>Page A<\/strong>: 35 conversions out of 1000 \u2192 posterior \\(\\text{Beta}(36, \\, 966)\\)<\/li>\n<li><strong>Page B<\/strong>: 58 conversions out of 1000 \u2192 posterior \\(\\text{Beta}(59, \\, 943)\\)<\/li>\n<\/ul>\n<p>Let&#8217;s compute in R the probability that B is better than A:<\/p>\n<pre><code class=\"language-r\">set.seed(42)\nn_sim &lt;- 100000\n\n# Posterior of the two variants\npost_A &lt;- rbeta(n_sim, shape1 = 36, shape2 = 966)\npost_B &lt;- rbeta(n_sim, shape1 = 59, shape2 = 943)\n\n# Probability that B &gt; A\nprob_B_better &lt;- mean(post_B &gt; post_A)\ncat(\"P(B &gt; A) =\", round(prob_B_better, 4), \"\\n\")\n\n# Distribution of the difference\ndiff &lt;- post_B - post_A\ncat(\"Median difference:\", round(median(diff) * 100, 2), \"pct points\\n\")\ncat(\"95% CI of the difference:\",\n    round(quantile(diff, 0.025) * 100, 2), \"-\",\n    round(quantile(diff, 0.975) * 100, 2), \"pct points\\n\")<\/code><\/pre>\n<p>The result is striking: the probability that B is better than A is above 99%. But the real advantage of the Bayesian approach is that we obtain directly the <strong>distribution of the difference<\/strong>: not only do we know <em>whether<\/em> B is better, but <em>by how much<\/em>, with a credible interval that quantifies our uncertainty.<\/p>\n<p>This is a substantial difference from the frequentist approach. The p-value tells us &#8220;the difference is unlikely under \\(H_0\\)&#8220;; the Bayesian result tells us &#8220;the probability that B is better is 99%, and the improvement lies between about 0.5 and 4.2 percentage points&#8221;. For an operational decision, the second piece of information is often more useful.<\/p>\n<p>An important note: the full Bayesian approach deserves a dedicated article. Here we have only scratched the surface \u2014 the topic of informative priors, hierarchical models and their systematic application is a path of its own that we will tackle in the section devoted to Bayesian statistics.<\/p>\n<h2 id=\"seo-example\">Practical SEO example: meta description A\/B test<\/h2>\n<p>Let&#8217;s look at one last scenario, very common in everyday practice. We have two meta description variants for a key page on the site. Alternating the two versions (two weeks each, to minimise seasonal effects) and consulting the Search Console data, we get:<\/p>\n<ul>\n<li><strong>Meta A<\/strong>: 3200 impressions, 128 clicks \u2192 CTR = 4.0%<\/li>\n<li><strong>Meta B<\/strong>: 3100 impressions, 155 clicks \u2192 CTR = 5.0%<\/li>\n<\/ul>\n<p>Let&#8217;s check in R:<\/p>\n<pre><code class=\"language-r\">prop.test(c(128, 155), c(3200, 3100))<\/code><\/pre>\n<p>the p-value is about 0.064 \u2014 above the 0.05 threshold, so we cannot reject the null hypothesis. The confidence interval of the difference also includes zero, confirming the non-significance. A borderline result, which tells us: with these data we don&#8217;t have enough evidence to conclude that Meta B is genuinely better.<\/p>\n<p>Which approach should we use? For a simple test like this, the frequentist approach with <code>prop.test()<\/code> is more than sufficient: we have large samples, the question is clear. The Bayesian approach becomes more valuable when the samples are small, when we want to update the result as new data arrive, or when we have prior knowledge to incorporate (for example, we know that for that type of page the CTR is typically between 3% and 7%).<\/p>\n<p>But the operational decision must not rest on the p-value alone. We have to ask: is the difference (one percentage point more of CTR) big enough to justify the change? With 3000-plus impressions a month, one percentage point more means about 30 additional clicks. Is that significant <em>for our business<\/em>? This is a question statistics cannot resolve on its own \u2014 it is a judgement that falls to us.<\/p>\n<h2 id=\"try-it-yourself\">Try it yourself<\/h2>\n<p>An e-commerce store is testing two call-to-action variants on a product page:<\/p>\n<ul>\n<li><strong>Variant A<\/strong> (&#8220;Add to cart&#8221;): 450 visits, 23 conversions<\/li>\n<li><strong>Variant B<\/strong> (&#8220;Buy it now&#8221;): 430 visits, 31 conversions<\/li>\n<\/ul>\n<ol>\n<li>Compute the conversion rate of each variant<\/li>\n<li>Run the test with <code>prop.test(c(23, 31), c(450, 430))<\/code> and interpret the p-value<\/li>\n<li>Does the confidence interval of the difference include zero?<\/li>\n<li>At the 5% significance level, is the difference statistically significant?<\/li>\n<\/ol>\n<p>Hint: if the p-value is above 0.05, we cannot conclude that one variant is better than the other \u2014 but this does not mean they are equal. It might simply mean we don&#8217;t have enough data. It is exactly the problem of the power of the test that we discussed.<\/p>\n<p>A\/B testing gives us a rigorous framework for making decisions based on data, not intuition. But as we have seen, a well-run test tells us <em>whether<\/em> there is a significant difference \u2014 it does not tell us how <em>large<\/em> that effect is, nor how much data we need to detect it with confidence. Those are the questions of <strong>effect size<\/strong> and <strong>power analysis<\/strong>, the next tools in our path. For the sample size, the <a href=\"https:\/\/www.gironi.it\/blog\/en\/ab-test-sample-size-calculator\/\">interactive calculator<\/a> lets you get the exact number in real time.<\/p>\n<hr>\n<h3>Further Reading<\/h3>\n<p>If you want to dig deeper into the methodology of online experiments, <a href=\"https:\/\/www.amazon.it\/dp\/1108724264?tag=consulenzeinf-21&#038;ascsubtag=ab-testing\" target=\"_blank\" rel=\"nofollow sponsored noopener\"><em>Trustworthy Online Controlled Experiments<\/em><\/a> by Ron Kohavi, Diane Tang and Ya Xu is the world reference on A\/B testing. The authors led the experimentation platforms at Microsoft, Amazon and LinkedIn \u2014 and the book covers everything, from test design to the pitfalls we saw in this article, all the way to the organisational aspects that make the difference between a well-run test and a sterile exercise.<\/p>\n<p>For those who want to explore the Bayesian approach to A\/B testing (which we have just introduced), <a href=\"https:\/\/www.amazon.it\/dp\/1593279566?tag=consulenzeinf-21&#038;ascsubtag=ab-testing\" target=\"_blank\" rel=\"nofollow sponsored noopener\"><em>Bayesian Statistics the Fun Way<\/em><\/a> by Will Kurt is an accessible and surprisingly entertaining introduction. It explains priors, posteriors and Bayesian updating with examples that don&#8217;t require a maths degree \u2014 and it uses R for the computational part.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over the previous articles we have looked at how hypothesis testing works and how the two-sample t-test lets us compare two groups rigorously. We have also built confidence intervals, learned to quantify the uncertainty of our estimates, and seen with the Central Limit Theorem why all this works even when the data are not normal. &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/ab-testing-statistically-valid-experiments\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;A\/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161],"tags":[],"class_list":["post-3830","post","type-post","status-publish","format-standard","hentry","category-statistics"],"lang":"en","translations":{"en":3830,"it":3385},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"Paolo Gironi","author_link":"https:\/\/www.gironi.it\/blog\/author\/autore-articoli\/"},"uagb_comment_info":0,"uagb_excerpt":"Over the previous articles we have looked at how hypothesis testing works and how the two-sample t-test lets us compare two groups rigorously. We have also built confidence intervals, learned to quantify the uncertainty of our estimates, and seen with the Central Limit Theorem why all this works even when the data are not normal.&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3830","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3830"}],"version-history":[{"count":1,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3830\/revisions"}],"predecessor-version":[{"id":3831,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3830\/revisions\/3831"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3830"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3830"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3830"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}