  <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>paologironi blog</title>
	<atom:link href="https://www.gironi.it/blog/en/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.gironi.it/blog</link>
	<description>Scattered notes on (retro) computing, data analysis, statistics, SEO, and things that change</description>
	<lastBuildDate>Fri, 19 Jun 2026 07:35:08 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)</title>
		<link>https://www.gironi.it/blog/en/ab-testing-statistically-valid-experiments/</link>
					<comments>https://www.gironi.it/blog/en/ab-testing-statistically-valid-experiments/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Fri, 19 Jun 2026 07:34:46 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/ab-testing-2/</guid>

					<description><![CDATA[Over the previous articles we have looked at how hypothesis testing works and how the two-sample t-test lets us compare two groups rigorously. We have also built confidence intervals, learned to quantify the uncertainty of our estimates, and seen with the Central Limit Theorem why all this works even when the data are not normal. &#8230; <a href="https://www.gironi.it/blog/en/ab-testing-statistically-valid-experiments/" class="more-link">Continue reading<span class="screen-reader-text"> "A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)"</span></a>]]></description>
										<content:encoded><![CDATA[<p>Over the previous articles we have looked at how <a href="https://www.gironi.it/blog/en/hypothesis-testing-a-step-by-step-guide/">hypothesis testing</a> works and how the <a href="https://www.gironi.it/blog/en/the-two-sample-t-test-how-to-test-a-hypothesis-for-dependent-or-independent-samples/">two-sample t-test</a> lets us compare two groups rigorously. We have also built <a href="https://www.gironi.it/blog/en/confidence-intervals-what-they-are-how-to-calculate-them-and-what-they-do-not-mean/">confidence intervals</a>, learned to quantify the uncertainty of our estimates, and seen with the <a href="https://www.gironi.it/blog/en/central-limit-theorem/">Central Limit Theorem</a> why all this works even when the data are not normal.</p>
<p>But there is one question that, in the day-to-day reality of anyone doing SEO and marketing, comes up almost daily: <strong>which variant performs better?</strong> Which title tag brings more clicks? Which landing page converts more? Which meta description draws attention? It is not an academic question: it is the question that separates data-driven decisions from opinions disguised as strategies.</p>
<p>The good news is that we already have all the tools to answer it. <strong>A/B testing</strong> is nothing more than the direct application of the statistical inference concepts we have built step by step: hypothesis testing, comparison between groups, significance. In this article we put it all together.</p>
<p><span id="more-3830"></span></p>
<div style="border: 1px solid #ccc; padding: 1.2em 1.5em; margin: 1.5em 0; border-radius: 6px;">
<h3 style="margin-top: 0;">What we&#8217;ll cover</h3>
<ul>
<li><a href="#what-is-ab-test">What an A/B test is</a></li>
<li><a href="#formulating-test">Setting up an A/B test correctly</a></li>
<li><a href="#landing-example">Worked example: conversion rate of two landing pages</a></li>
<li><a href="#common-mistakes">The most common mistakes</a></li>
<li><a href="#frequentist-vs-bayesian">Frequentist vs Bayesian approach</a></li>
<li><a href="#seo-example">Practical SEO example: meta description A/B test</a></li>
<li><a href="#try-it-yourself">Try it yourself</a></li>
</ul>
</div>
<h2 id="what-is-ab-test">What an A/B test is</h2>
<p>An A/B test is, in essence, a <strong>controlled experiment</strong>: we take two variants of something (a page, a headline, a call-to-action), randomly assign users to one of the two variants, and measure which one produces better results.</p>
<p>Variant <strong>A</strong> is the <strong>control</strong> (the current version, the one we are already using). Variant <strong>B</strong> is the <strong>treatment</strong> (the new version we want to test). The logic is the same as a scientific experiment: we change one variable at a time, keep everything else constant, and observe whether the change produces a measurable effect.</p>
<p>Three elements make an A/B test reliable. <strong>Randomisation</strong>: users are assigned to A or B at random. This is essential, because if we showed A in the morning and B in the afternoon, any observed difference might depend on the time of day, not on the variant. The <strong>control group</strong>: without A as a reference, we wouldn&#8217;t know whether B&#8217;s results are good or bad. And finally a <strong>success metric</strong> defined in advance: CTR, conversion rate, time on page. The metric must be chosen <em>before</em> collecting the data, not after (we will come back to this point shortly).</p>
<p>But why do we need statistics? Because data are noisy. If variant A has a CTR of 5.0% and variant B of 5.3%, is that difference real or just random fluctuation? The naked eye cannot tell: we need a formal test. And it is precisely the <strong>two-sample test</strong> we have already seen — applied to proportions rather than means.</p>
<h2 id="formulating-test">Setting up an A/B test correctly</h2>
<p>Before collecting data, we have to set up the test rigorously. Let&#8217;s see how.</p>
<p><strong>Choosing the metric.</strong> The metric must be clear, measurable and directly linked to the goal. For a title tag, the natural metric is the <strong>CTR</strong> (Click-Through Rate). For a landing page, the <strong>conversion rate</strong>. For a blog article, perhaps the <strong>average time on page</strong>. Always keep this in mind: a vague metric (&#8220;people like the page more&#8221;) is not a metric.</p>
<p><strong>Defining the hypotheses.</strong> As in every statistical test, we start from a null hypothesis and an alternative hypothesis:</p>
<ul>
<li>\(H_0\): the two variants have the same effect (no difference between A and B)</li>
<li>\(H_1\): the two variants have a different effect (a difference exists)</li>
</ul>
<p><strong>The statistical test.</strong> When we compare two proportions (such as two CTRs or two conversion rates), the appropriate test is the <strong>two-proportion z-test</strong>. The logic is the same as the two-sample t-test, but adapted to binary data (click/no-click, conversion/no-conversion).</p>
<p>The test statistic is computed as follows. First, we compute the <strong>pooled proportion</strong>, which is our best estimate of the common proportion under the null hypothesis:</p>
\(<br />
\hat{p} = \frac{x_1 + x_2}{n_1 + n_2} \\<br />
\)
<p>where \(x_1\) and \(x_2\) are the successes (clicks, conversions) in the two groups, and \(n_1\) and \(n_2\) the sample sizes.</p>
<p>Then we compute the z statistic:</p>
\(<br />
z = \frac{\hat{p}_1 &#8211; \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \\<br />
\)
<p>The numerator is the observed difference between the two proportions; the denominator is the standard error under the null hypothesis. The ratio tells us how many &#8220;standard-error units&#8221; separate the two proportions: the higher it is, the harder the difference is to attribute to chance.</p>
<h3>Example: CTR of two title tags</h3>
<p>Let&#8217;s take a concrete example. We tested two title tag variants for an important page on the site:</p>
<ul>
<li><strong>Title A</strong> (control): 1500 impressions, 75 clicks → CTR = 5.0%</li>
<li><strong>Title B</strong> (treatment): 1500 impressions, 105 clicks → CTR = 7.0%</li>
</ul>
<p>Title B looks better, but is the difference statistically significant? Let&#8217;s compute it step by step.</p>
<p><strong>Step 1</strong>: the pooled proportion:</p>
\(<br />
\hat{p} = \frac{75 + 105}{1500 + 1500} = \frac{180}{3000} = 0.06 \\<br />
\)
<p><strong>Step 2</strong>: the standard error:</p>
\(<br />
SE = \sqrt{0.06 \times 0.94 \times \left(\frac{1}{1500} + \frac{1}{1500}\right)} = \sqrt{0.0564 \times 0.00133} \approx 0.00867 \\<br />
\)
<p><strong>Step 3</strong>: the z statistic:</p>
\(<br />
z = \frac{0.07 &#8211; 0.05}{0.00867} \approx 2.31 \\<br />
\)
<p><strong>Step 4</strong>: the p-value. For a two-tailed test, \(p \approx 0.021\).</p>
<p>So: the p-value is below 0.05. We can reject the null hypothesis and conclude that the difference between the two title tags is statistically significant. Title B has a significantly higher CTR.</p>
<p>Let&#8217;s run the same test in R:</p>
<pre><code class="language-r"># Data
n1 &lt;- 1500; x1 &lt;- 75    # Title A
n2 &lt;- 1500; x2 &lt;- 105   # Title B
p1 &lt;- x1 / n1  # 0.05
p2 &lt;- x2 / n2  # 0.07

# Pooled proportion and z-test
p_pool &lt;- (x1 + x2) / (n1 + n2)
se &lt;- sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
z &lt;- (p2 - p1) / se
p_value &lt;- 2 * (1 - pnorm(abs(z)))

cat("z =", round(z, 3), "\n")
cat("p-value =", round(p_value, 4), "\n")</code></pre>
<p>Result: z = 2.306, p-value = 0.0211.</p>
<h2 id="landing-example">Worked example: conversion rate of two landing pages</h2>
<p>Let&#8217;s move on to a more elaborate example. An e-commerce store is testing two variants of its landing page:</p>
<ul>
<li><strong>Page A</strong> (current design): 1000 visitors, 35 conversions → conversion rate = 3.5%</li>
<li><strong>Page B</strong> (new design): 1000 visitors, 58 conversions → conversion rate = 5.8%</li>
</ul>
<p>The difference looks substantial (2.3 percentage points), but with these numbers is it enough to rule out chance?</p>
<p>Let&#8217;s check in R with <code>prop.test()</code>, which runs the two-proportion test:</p>
<pre><code class="language-r">result &lt;- prop.test(
  x = c(35, 58),
  n = c(1000, 1000)
)

print(result)</code></pre>
<p>The function returns the p-value of the test and, very usefully, the <strong>confidence interval of the difference</strong> between the two proportions. In this case the p-value is about 0.019 — below 0.05, so the difference is statistically significant.</p>
<p>But it is the confidence interval of the difference that gives us the most valuable information: not only <em>whether</em> B is better than A, but <em>by how much</em>, and with what margin of uncertainty. If the CI of the difference runs from about 0.4 to 4.2 percentage points, we know that B is almost certainly better, and the improvement lies within that range. That is far richer information than a simple &#8220;yes, it&#8217;s significant&#8221;.</p>
<p>n.b.: <code>prop.test()</code> applies a <strong>continuity correction</strong> (Yates&#8217;s correction) that makes the test slightly more conservative. For large samples the difference is negligible; for small samples, it is a welcome caution.</p>
<h2 id="common-mistakes">The most common mistakes</h2>
<p>A/B testing is a powerful tool, but a treacherous one. The ease with which a test can be set up hides serious methodological pitfalls. Let&#8217;s look at the most frequent ones.</p>
<h3>Stopping the test too early</h3>
<p>It is the strongest temptation: after a few days, B looks clearly better than A. Why wait any longer? Because those preliminary results are <strong>noise</strong>, not signal.</p>
<p>The problem has a technical name: <strong>peeking</strong>. Every time we look at the interim data and decide whether to stop, we increase the probability of a false positive. It&#8217;s like tossing a coin: if we stop every time we get three heads in a row, we&#8217;ll conclude the coin is rigged. But it isn&#8217;t — we simply haven&#8217;t given it enough tosses.</p>
<p><strong>How to avoid it</strong>: define the required sample size <em>beforehand</em> and wait until you reach that number before drawing conclusions. In the meantime, you can use our <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a> to determine how many users you need before launching the test.</p>
<h3>Testing too many variants without correction</h3>
<p>Another frequent mistake: testing three, four, five variants at the same time (A/B/C/D&#8230;) and then comparing them all pairwise. The problem is that of <strong>multiple comparisons</strong>: the more comparisons we make, the more likely we are to find at least one significant result by pure chance.</p>
<p>With 5 variants and 10 pairwise comparisons, the probability of finding at least one false positive rises from 5% to almost 40%. This is not a detail: it is an error that invalidates the entire test.</p>
<p><strong>How to avoid it</strong>: if multiple comparisons are needed, apply a <strong>Bonferroni correction</strong> (divide the \(\alpha\) threshold by the number of comparisons) or, better still, limit yourself to testing one variant at a time.</p>
<h3>Ignoring the power of the test</h3>
<p>We know the risk of a false positive well (type I error, \(\alpha\)). But there is a mirror risk that is often ignored: the <strong>false negative</strong> (type II error, \(\beta\)). It happens when B really is better than A, but our test fails to detect it.</p>
<p>The most common cause? A <strong>sample that is too small</strong>. If we have only 100 visitors per variant, the test does not have enough &#8220;power&#8221; to detect small but real differences. We will conclude &#8220;no significant difference&#8221; not because the difference doesn&#8217;t exist, but because we didn&#8217;t have enough data to see it.</p>
<p><strong>How to avoid it</strong>: compute the required sample size <em>before</em> launching the test, based on the minimum effect we want to detect. This is the subject of <strong>power analysis</strong>: use the <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a> to check whether your test has enough power.</p>
<h3>Confusing statistical significance with practical significance</h3>
<p>A low p-value does not automatically mean the result is <em>important</em>. With very large samples, even microscopic differences become statistically significant. If we test two variants on 500,000 visitors, a CTR difference of 0.01% (from 5.00% to 5.01%) might come out significant. But it is an operationally irrelevant difference.</p>
<p><strong>Caution</strong>: the p-value answers the question &#8220;is the difference real?&#8221;, not the question &#8220;is the difference big enough to matter to us?&#8221;. For the latter we need a different measure — the <strong>effect size</strong> — which we cover in a dedicated article.</p>
<h2 id="frequentist-vs-bayesian">Frequentist vs Bayesian approach</h2>
<p>Everything we have seen so far follows the <strong>frequentist</strong> approach: we compute a test statistic, compare it with a reference distribution, obtain a p-value and make a binary decision (reject or fail to reject \(H_0\)).</p>
<p>It works, and works well. But it has limits that you feel in everyday practice. The p-value does not tell us &#8220;by how much B is better than A&#8221;. It does not tell us &#8220;what the probability is that B is genuinely superior&#8221;. And if we collect new data, we cannot simply update the result: we have to recompute everything from scratch.</p>
<p>There is an alternative approach that answers directly the question we care about most: <strong>what is the probability that B is better than A?</strong> It is the <strong>Bayesian</strong> approach.</p>
<p>The idea is this. Instead of starting from a null hypothesis and trying to reject it, we start from a <strong>prior distribution</strong> that represents our initial knowledge about each variant&#8217;s conversion. Then, as we collect data, we update that distribution. The result is a <strong>posterior distribution</strong> that incorporates both our prior knowledge and the observed data.</p>
<p>For conversion rates, the natural distribution is the <strong>Beta</strong>: it is defined between 0 and 1 (like a proportion) and updates very elegantly. If we start from a prior \(\text{Beta}(\alpha, \beta)\) and observe \(s\) successes out of \(n\) trials, the posterior is:</p>
\(<br />
\text{Beta}(\alpha + s, \, \beta + n &#8211; s) \\<br />
\)
<p>Sounds hard? It&#8217;s very easy. Let&#8217;s use the data from the two landing pages in the previous example. We start from a <strong>non-informative prior</strong> \(\text{Beta}(1, 1)\) — which amounts to saying &#8220;we know nothing, any value between 0 and 1 is equally plausible&#8221;:</p>
<ul>
<li><strong>Page A</strong>: 35 conversions out of 1000 → posterior \(\text{Beta}(36, \, 966)\)</li>
<li><strong>Page B</strong>: 58 conversions out of 1000 → posterior \(\text{Beta}(59, \, 943)\)</li>
</ul>
<p>Let&#8217;s compute in R the probability that B is better than A:</p>
<pre><code class="language-r">set.seed(42)
n_sim &lt;- 100000

# Posterior of the two variants
post_A &lt;- rbeta(n_sim, shape1 = 36, shape2 = 966)
post_B &lt;- rbeta(n_sim, shape1 = 59, shape2 = 943)

# Probability that B &gt; A
prob_B_better &lt;- mean(post_B &gt; post_A)
cat("P(B &gt; A) =", round(prob_B_better, 4), "\n")

# Distribution of the difference
diff &lt;- post_B - post_A
cat("Median difference:", round(median(diff) * 100, 2), "pct points\n")
cat("95% CI of the difference:",
    round(quantile(diff, 0.025) * 100, 2), "-",
    round(quantile(diff, 0.975) * 100, 2), "pct points\n")</code></pre>
<p>The result is striking: the probability that B is better than A is above 99%. But the real advantage of the Bayesian approach is that we obtain directly the <strong>distribution of the difference</strong>: not only do we know <em>whether</em> B is better, but <em>by how much</em>, with a credible interval that quantifies our uncertainty.</p>
<p>This is a substantial difference from the frequentist approach. The p-value tells us &#8220;the difference is unlikely under \(H_0\)&#8220;; the Bayesian result tells us &#8220;the probability that B is better is 99%, and the improvement lies between about 0.5 and 4.2 percentage points&#8221;. For an operational decision, the second piece of information is often more useful.</p>
<p>An important note: the full Bayesian approach deserves a dedicated article. Here we have only scratched the surface — the topic of informative priors, hierarchical models and their systematic application is a path of its own that we will tackle in the section devoted to Bayesian statistics.</p>
<h2 id="seo-example">Practical SEO example: meta description A/B test</h2>
<p>Let&#8217;s look at one last scenario, very common in everyday practice. We have two meta description variants for a key page on the site. Alternating the two versions (two weeks each, to minimise seasonal effects) and consulting the Search Console data, we get:</p>
<ul>
<li><strong>Meta A</strong>: 3200 impressions, 128 clicks → CTR = 4.0%</li>
<li><strong>Meta B</strong>: 3100 impressions, 155 clicks → CTR = 5.0%</li>
</ul>
<p>Let&#8217;s check in R:</p>
<pre><code class="language-r">prop.test(c(128, 155), c(3200, 3100))</code></pre>
<p>the p-value is about 0.064 — above the 0.05 threshold, so we cannot reject the null hypothesis. The confidence interval of the difference also includes zero, confirming the non-significance. A borderline result, which tells us: with these data we don&#8217;t have enough evidence to conclude that Meta B is genuinely better.</p>
<p>Which approach should we use? For a simple test like this, the frequentist approach with <code>prop.test()</code> is more than sufficient: we have large samples, the question is clear. The Bayesian approach becomes more valuable when the samples are small, when we want to update the result as new data arrive, or when we have prior knowledge to incorporate (for example, we know that for that type of page the CTR is typically between 3% and 7%).</p>
<p>But the operational decision must not rest on the p-value alone. We have to ask: is the difference (one percentage point more of CTR) big enough to justify the change? With 3000-plus impressions a month, one percentage point more means about 30 additional clicks. Is that significant <em>for our business</em>? This is a question statistics cannot resolve on its own — it is a judgement that falls to us.</p>
<h2 id="try-it-yourself">Try it yourself</h2>
<p>An e-commerce store is testing two call-to-action variants on a product page:</p>
<ul>
<li><strong>Variant A</strong> (&#8220;Add to cart&#8221;): 450 visits, 23 conversions</li>
<li><strong>Variant B</strong> (&#8220;Buy it now&#8221;): 430 visits, 31 conversions</li>
</ul>
<ol>
<li>Compute the conversion rate of each variant</li>
<li>Run the test with <code>prop.test(c(23, 31), c(450, 430))</code> and interpret the p-value</li>
<li>Does the confidence interval of the difference include zero?</li>
<li>At the 5% significance level, is the difference statistically significant?</li>
</ol>
<p>Hint: if the p-value is above 0.05, we cannot conclude that one variant is better than the other — but this does not mean they are equal. It might simply mean we don&#8217;t have enough data. It is exactly the problem of the power of the test that we discussed.</p>
<p>A/B testing gives us a rigorous framework for making decisions based on data, not intuition. But as we have seen, a well-run test tells us <em>whether</em> there is a significant difference — it does not tell us how <em>large</em> that effect is, nor how much data we need to detect it with confidence. Those are the questions of <strong>effect size</strong> and <strong>power analysis</strong>, the next tools in our path. For the sample size, the <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">interactive calculator</a> lets you get the exact number in real time.</p>
<hr>
<h3>Further Reading</h3>
<p>If you want to dig deeper into the methodology of online experiments, <a href="https://www.amazon.it/dp/1108724264?tag=consulenzeinf-21&#038;ascsubtag=ab-testing" target="_blank" rel="nofollow sponsored noopener"><em>Trustworthy Online Controlled Experiments</em></a> by Ron Kohavi, Diane Tang and Ya Xu is the world reference on A/B testing. The authors led the experimentation platforms at Microsoft, Amazon and LinkedIn — and the book covers everything, from test design to the pitfalls we saw in this article, all the way to the organisational aspects that make the difference between a well-run test and a sterile exercise.</p>
<p>For those who want to explore the Bayesian approach to A/B testing (which we have just introduced), <a href="https://www.amazon.it/dp/1593279566?tag=consulenzeinf-21&#038;ascsubtag=ab-testing" target="_blank" rel="nofollow sponsored noopener"><em>Bayesian Statistics the Fun Way</em></a> by Will Kurt is an accessible and surprisingly entertaining introduction. It explains priors, posteriors and Bayesian updating with examples that don&#8217;t require a maths degree — and it uses R for the computational part.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/ab-testing-statistically-valid-experiments/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>An Introduction to Principal Component Analysis (PCA)</title>
		<link>https://www.gironi.it/blog/en/principal-component-analysis-pca/</link>
					<comments>https://www.gironi.it/blog/en/principal-component-analysis-pca/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Fri, 19 Jun 2026 07:29:10 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/principal-component-analysis-pca/</guid>

					<description><![CDATA[Principal Component Analysis (PCA) is a widely used statistical technique for reducing the complexity of large datasets. It aims to cut down the number of variables, transforming potentially correlated ones into a smaller set of uncorrelated variables called principal components. This methodology answers the need to represent complex phenomena — described by a large number &#8230; <a href="https://www.gironi.it/blog/en/principal-component-analysis-pca/" class="more-link">Continue reading<span class="screen-reader-text"> "An Introduction to Principal Component Analysis (PCA)"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Principal Component Analysis (PCA) is a widely used statistical technique for <strong>reducing the complexity of large datasets</strong>. It aims to <strong>cut down the number of variables</strong>, transforming potentially correlated ones into a smaller set of uncorrelated variables called <strong>principal components</strong>.</p>



<span id="more-3828"></span>



<p class="wp-block-paragraph">This methodology answers the need to represent complex phenomena — described by a large number of variables — through a smaller number of variables that retain most of the original information. The primary goal is to maximise the variance captured by these new components, thereby ensuring minimal information loss.</p>



<p class="wp-block-paragraph">In practice, PCA proves particularly useful when we face datasets with many variables that are correlated with one another. In such scenarios, analysing all the variables directly can become complex and hard to interpret. PCA lets us concentrate the information contained in the original variables into a reduced number of principal components, making it easier to spot underlying patterns and trends.</p>



<p class="wp-block-paragraph">To grasp the idea of dimensionality reduction, picture a city with many interconnected streets. PCA works much like an urban-planning system that identifies the main traffic arteries. By focusing on these &#8220;main roads&#8221;, we get a clear view of the city&#8217;s structure and its traffic flows, without having to analyse every single side street.</p>



<p class="wp-block-paragraph">In the specific context of web marketing and data analysis, PCA is a powerful tool for several reasons. It is <strong>effective for visualising and exploring high-dimensional datasets</strong>, making it easy to <strong>spot trends, patterns or outliers</strong>. It is also commonly used in the data pre-processing stage for machine learning algorithms, since it can extract the most informative features from large datasets while preserving the most relevant information. A further advantage is its ability to <strong>minimise or eliminate multicollinearity and overfitting</strong>, frequent problems in web marketing datasets characterised by many potentially correlated variables.</p>



<h2 class="wp-block-heading">The Mathematical Foundations of PCA</h2>



<p class="wp-block-paragraph">To fully understand how PCA works, it helps to get familiar with a few key mathematical concepts.</p>



<p class="wp-block-paragraph"><strong>Variance</strong> and <strong>covariance</strong> are statistical concepts central to PCA. Variance measures the dispersion of a single variable around its mean, indicating how far its values lie from the central value. Covariance, instead, quantifies how two variables change together: a positive covariance suggests the variables tend to rise or fall at the same time, while a negative covariance indicates an inverse relationship. The goal of PCA is to find components that exhibit the maximum possible variance, since greater variance is often associated with a greater amount of information. The <strong>covariance matrix</strong> is a tool that summarises the covariances between every possible pair of variables in a dataset. Its diagonal elements represent the variances of each variable, while the off-diagonal elements indicate the covariances between pairs. This matrix is a crucial input for the PCA algorithm, because it describes the structure of the linear relationships between the variables.</p>



<p class="wp-block-paragraph"><strong>Eigenvalues</strong> and <strong>eigenvectors</strong> are the mathematical heart of PCA. In simple terms, the principal components of a dataset are the eigenvectors of its covariance matrix. An eigenvector represents a direction in the space of the original data, while its associated eigenvalue indicates the magnitude of the data&#8217;s variance along that direction. In other words, the eigenvectors identify the directions in which the data vary the most, and the eigenvalues quantify the importance of each of these directions in terms of explained variance.</p>



<p class="wp-block-paragraph"><strong>Explained variance</strong> is a fundamental metric for assessing the importance of each principal component. It represents the proportion of the original data&#8217;s total variance that is captured by a specific principal component, computed by dividing the component&#8217;s eigenvalue by the sum of all eigenvalues. The <strong>cumulative explained variance</strong> indicates the total amount of variance captured by a given number of principal components, summing their individual proportions. This metric is crucial for deciding how many principal components to keep in order to represent the data adequately without losing a significant amount of information.</p>



<p class="wp-block-paragraph">A side note: criteria such as the <em>Kaiser rule</em> — which suggests keeping only the components with eigenvalues greater than 1 — and the <em>scree plot</em> — a chart of the ordered eigenvalues that helps identify the &#8220;elbow&#8221; of the curve as a cut-off point — are useful for guiding the choice of the optimal number of principal components.</p>



<h2 class="wp-block-heading">Practical Applications of PCA Across Different Fields</h2>



<p class="wp-block-paragraph">PCA is a versatile technique with a <strong>wide range of applications across different fields</strong>. In general, it is used for dimensionality reduction, the visualisation of complex data, noise removal and the extraction of relevant features for later analysis or for training machine learning models.</p>



<p class="wp-block-paragraph">In <strong>image processing</strong>, PCA is used for compression, reducing the number of pixels needed to represent an image while keeping its essential features. In <strong>genomics and bioinformatics</strong>, it helps identify the most critical genes that drive variation, reducing the complexity of genomic data. In <strong>finance</strong>, it can be applied to risk analysis and portfolio optimisation, identifying the key economic factors that influence asset performance. In <strong>healthcare</strong>, it is used to analyse medical images such as MRI scans, to improve visualisation and aid diagnosis. In <strong>security</strong>, it finds application in biometric systems for fingerprint recognition, extracting the most relevant features. And in <strong>climatology</strong>, the technique is used to analyse and interpret large environmental datasets.</p>



<p class="wp-block-paragraph">When it comes specifically to <strong>data analysis and marketing</strong>, PCA offers several benefits. It lets us simplify complex datasets, reduce the noise in the data, extract the most significant features for further analysis and improve the performance of predictive models. Its ability to visualise high-dimensional data in a two- or three-dimensional space makes it easier to identify patterns, trends and outliers, rendering the data more accessible to interpret.</p>



<h2 class="wp-block-heading">Concrete Use of PCA in Web Marketing, SEO, SEM and Data Analysis</h2>



<p class="wp-block-paragraph">Principal Component Analysis can be applied effectively across various areas of web marketing, SEO, SEM and data analysis to gain meaningful insights and optimise strategies.</p>



<p class="wp-block-paragraph">In the analysis of <strong>keyword</strong> data, PCA can be used to reduce the dimensionality of word or document embeddings. A keyword dataset can be characterised by numerous metrics such as search volume, competition level, cost per click (CPC) and various semantic features. By applying PCA, we can condense these many dimensions into a smaller number of principal components that capture the underlying themes or features of the keywords. This can simplify the analysis, for example by identifying groups of keywords with similar performance profiles.</p>



<p class="wp-block-paragraph">For the <strong>analysis of web traffic metrics</strong>, PCA can help identify meaningful patterns. Traffic metrics such as sessions, bounce rate, time on page and conversions from different sources can be analysed with PCA to uncover latent variables that drive website performance. For instance, a principal component related to user engagement might emerge, alongside a second component tied to the effectiveness of the different traffic sources. This understanding can inform decisions on marketing budget allocation and website optimisation.</p>



<p class="wp-block-paragraph"><strong>User segmentation</strong> based on online behaviour and demographic data is another area where PCA proves valuable. By analysing user data with many variables — purchase history, browsing behaviour and demographic information — PCA can identify natural groupings of users with similar characteristics. This makes it possible to create more clearly defined customer segments and to target marketing activities more effectively.</p>



<p class="wp-block-paragraph">Finally, PCA can help improve the <strong>analysis of advertising campaign performance</strong>. Campaign performance metrics such as impressions, clicks, conversions and cost per acquisition can be analysed to identify the key factors that drive campaign success. For example, PCA might reveal that a specific combination of ad creative and targeting parameters is the main driver of conversions, providing valuable guidance for optimising campaign strategies and improving the return on investment.</p>



<h2 class="wp-block-heading">Implementing PCA with R: Practical Examples</h2>



<p class="wp-block-paragraph">To implement PCA in R, we first need to set up the environment and load the necessary libraries. The fundamental ones include <code>stats</code> for the base PCA functions such as <code>prcomp()</code> and <code>princomp()</code>, <code>factoextra</code> for visualising the results, and potentially <code>dplyr</code> and <code>ggplot2</code> for data manipulation and visualisation.</p>



<p class="wp-block-paragraph">To illustrate how PCA applies in a web marketing context, we can create synthetic datasets that simulate real-world scenarios.</p>



<p class="wp-block-paragraph"><strong>Example 1: Keyword ranking data</strong></p>



<p class="wp-block-paragraph">Suppose we have a dataset with information on several keywords, including monthly search volume, a competition score (from 0 to 1), the average cost per click (CPC) and the average position on Google&#8217;s and Bing&#8217;s search results pages. We can create a synthetic data frame in R as follows:</p>



<pre class="wp-block-code"><code># Synthetic data for keyword ranking
set.seed(123)
n_keywords &lt;- 100
keywords &lt;- paste0("keyword_", 1:n_keywords)
search_volume &lt;- round(runif(n_keywords, min = 100, max = 10000))
competition &lt;- runif(n_keywords, min = 0.1, max = 0.9)
cpc &lt;- round(rnorm(n_keywords, mean = 2.5, sd = 1), 2)
ranking_google &lt;- round(rnorm(n_keywords, mean = 15, sd = 10), 0)
ranking_bing &lt;- round(rnorm(n_keywords, mean = 12, sd = 8), 0)

keyword_data &lt;- data.frame(
  Keyword = keywords,
  Search_Volume = search_volume,
  Competition = competition,
  CPC = cpc,
  Ranking_Google = ranking_google,
  Ranking_Bing = ranking_bing
)

head(keyword_data)
#     Keyword Search_Volume Competition  CPC Ranking_Google Ranking_Bing
# 1 keyword_1          2947   0.5799912 1.79             37            6
# 2 keyword_2          7904   0.3662588 2.76             28            6
# 3 keyword_3          4149   0.4908904 2.25             12            4
# 4 keyword_4          8842   0.8635791 2.15             20            4
# 5 keyword_5          9411   0.4863219 1.55             11            9
# 6 keyword_6           551   0.8122802 2.45             10           15</code></pre>



<p class="wp-block-paragraph"><strong>Example 2: Advertising campaign performance data</strong></p>



<p class="wp-block-paragraph">Similarly, we can create synthetic data for advertising campaign performance, including metrics such as impressions, clicks, conversions, total cost, click-through rate (CTR) and cost per acquisition (CPA).</p>



<pre class="wp-block-code"><code># Synthetic data for advertising campaign performance
set.seed(456)
n_campaigns &lt;- 50
campaign_ids &lt;- paste0("campaign_", 1:n_campaigns)
impressions &lt;- round(runif(n_campaigns, min = 1000, max = 100000))
clicks &lt;- round(impressions * runif(n_campaigns, min = 0.01, max = 0.1))
conversions &lt;- round(clicks * runif(n_campaigns, min = 0.005, max = 0.05))
cost &lt;- round(clicks * runif(n_campaigns, min = 0.1, max = 2), 2)
ctr &lt;- round((clicks / impressions) * 100, 2)
cpa &lt;- round(cost / conversions, 2)
cpa[is.nan(cpa)] &lt;- 0  # Handle NaN

campaign_data &lt;- data.frame(
  Campaign_ID = campaign_ids,
  Impressions = impressions,
  Clicks = clicks,
  Conversions = conversions,
  Cost = cost,
  CTR = ctr,
  CPA = cpa
)

head(campaign_data)
#   Campaign_ID Impressions Clicks Conversions    Cost  CTR    CPA
# 1  campaign_1        9866    873          14 1093.32 8.85  78.09
# 2  campaign_2       21841   1788          20 3360.17 8.19 168.01
# 3  campaign_3       73563   2866          66 2764.48 3.90  41.89
# 4  campaign_4       85361   4121          73 1422.12 4.83  19.48
# 5  campaign_5       79051   3432         133 1623.28 4.34  12.21
# 6  campaign_6       33864   3064         126 6047.70 9.05  48.00</code></pre>



<p class="wp-block-paragraph">Once the datasets are ready, we can run PCA using the <code>prcomp()</code> function. It is essential to scale the data before applying PCA, to prevent variables with larger scales from dominating the analysis.</p>



<pre class="wp-block-code"><code># PCA on the keyword ranking data (5 variables -&gt; 5 components)
pca_keywords &lt;- prcomp(keyword_data[, 2:6], scale. = TRUE)
summary(pca_keywords)
#                           PC1    PC2    PC3    PC4    PC5
# Standard deviation     1.1381 1.0298 0.9894 0.9305 0.8941
# Proportion of Variance 0.2591 0.2121 0.1958 0.1732 0.1599
# Cumulative Proportion  0.2591 0.4712 0.6670 0.8401 1.0000

# PCA on the advertising campaign data (6 variables -&gt; 6 components)
pca_campaigns &lt;- prcomp(campaign_data[, 2:7], scale. = TRUE)
summary(pca_campaigns)
#                           PC1    PC2    PC3     PC4    PC5     PC6
# Standard deviation     1.7837 1.2229 0.9303 0.49392 0.4250 0.18138
# Proportion of Variance 0.5303 0.2492 0.1442 0.04066 0.0301 0.00548
# Cumulative Proportion  0.5303 0.7795 0.9238 0.96442 0.9945 1.00000</code></pre>



<p class="wp-block-paragraph">The two summaries already tell a story. For the keyword data the variance is spread fairly evenly across the five components (the first captures only 26%): a sign that those metrics are largely uncorrelated, and that PCA cannot compress them much without losing information. For the campaign data, instead, the first two components together account for almost 78% of the variance — the metrics are strongly correlated (more impressions, more clicks, more conversions, more cost), and two dimensions are enough to describe most of what is going on.</p>



<p class="wp-block-paragraph">The output of <code>summary()</code> provides crucial information such as the standard deviations of the principal components, the proportion of variance explained by each component and the cumulative proportion. The <strong>loadings</strong> (or rotation matrix), accessible via <code>pca_keywords\( rotation</code> and <code>pca_campaigns \)rotation</code>, show the correlation between the original variables and the principal components, helping to interpret the meaning of each component. The <strong>scores</strong> (or component coordinates), accessible via <code>pca_keywords\( x</code> and <code>pca_campaigns \)x</code>, represent the projection of the original data onto the new space defined by the principal components.</p>



<p class="wp-block-paragraph">To visualise the results, we can use the <strong>scree plot</strong> and the <strong>biplot</strong>. The scree plot (obtained with <code>plot(pca_keywords)</code> and <code>plot(pca_campaigns)</code>) shows the eigenvalues in decreasing order and helps identify the optimal number of components to keep. The biplot (obtained with <code>biplot(pca_keywords)</code> and <code>biplot(pca_campaigns)</code>) displays both the scores of the observations and the loadings of the variables in the plane defined by the first two principal components, providing a visual representation of the relationships between observations and variables.</p>



<h2 class="wp-block-heading">Checking and Interpreting the PCA Results</h2>



<p class="wp-block-paragraph">To check the accuracy of the R code and of the interpretations, it is advisable to consult the official documentation of the <code>prcomp()</code> and <code>princomp()</code> functions in R&#8217;s <code>stats</code> package, as well as the documentation of the <code>factoextra</code> library for the visualisations. If needed, the results can be compared with those obtained from other statistical software or online resources. It is important to keep in mind the assumptions underlying PCA, such as the linearity of the relationships between the variables and the sensitivity to the scale of the data, as well as the potential impact of outliers.</p>



<p class="wp-block-paragraph">Making sense of the principal components in the context of web marketing data requires an understanding of what the original variables mean and of how they contribute to each component, as indicated by the loadings. For example, if in the PCA on the keyword ranking data the first principal component has high, positive loadings for search volume and CPC, it might be interpreted as a measure of &#8220;high-potential keywords&#8221;. The interpretation requires solid domain knowledge of web marketing.</p>



<p class="wp-block-paragraph"><strong>It is important to consider the limitations of PCA. It assumes linear relationships between the variables and can entail a loss of information when reducing dimensionality.</strong> For data with non-linear relationships, alternative techniques such as t-SNE and UMAP may be more appropriate.</p>



<h2 class="wp-block-heading">Conclusion: Leveraging PCA to Optimise Web Marketing Strategies</h2>



<p class="wp-block-paragraph">Principal Component Analysis stands out as a powerful and versatile analytical tool for optimising web marketing strategies. The benefits of using PCA in this domain are manifold. First, its ability to <strong>reduce the dimensionality</strong> of complex datasets makes it possible to simplify the analysis and focus on the most relevant information. Second, PCA lets us <strong>identify underlying patterns</strong> in the data that might not be evident from a surface-level analysis, revealing meaningful relationships between different web marketing metrics. Furthermore, using PCA as a pre-processing step can <strong>improve the performance of predictive models</strong>, reducing noise and multicollinearity in the data. Finally, the ability to <strong>visualise high-dimensional data</strong> in a reduced space makes it easier to understand and communicate the insights drawn from the analysis.</p>



<p class="wp-block-paragraph">For further exploration and more advanced applications, one could consider using PCA as a preliminary step for clustering algorithms, in order to segment keywords, users or advertising campaigns more effectively. Integrating PCA into predictive modelling pipelines could lead to more robust and interpretable models. Finally, looking into techniques such as <em>sparse PCA</em> could be useful for intrinsically selecting the most important variables in the web marketing context.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading">Further Reading</h3>



<p class="wp-block-paragraph">Principal component analysis is covered with exemplary clarity in <a href="https://www.amazon.it/dp/1461471370?tag=consulenzeinf-21&amp;ascsubtag=principal-component-analysis-pca" rel="nofollow sponsored noopener" target="_blank"><em>An Introduction to Statistical Learning</em></a> by James, Witten, Hastie and Tibshirani, alongside the other unsupervised learning techniques.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/principal-component-analysis-pca/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Correlation: Pearson, Spearman and Kendall (and Why It Isn&#8217;t Causation)</title>
		<link>https://www.gironi.it/blog/en/correlation/</link>
					<comments>https://www.gironi.it/blog/en/correlation/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Fri, 19 Jun 2026 07:10:34 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3821</guid>

					<description><![CDATA[Anyone who looks at a website&#8217;s data does it constantly, often without noticing: they spot that two things seem to move together. Pages that sit higher in the SERP get more clicks; the ones where users linger longer convert more; longer articles appear to rank better. These are valuable hunches, but they stay vague until &#8230; <a href="https://www.gironi.it/blog/en/correlation/" class="more-link">Continue reading<span class="screen-reader-text"> "Correlation: Pearson, Spearman and Kendall (and Why It Isn&#8217;t Causation)"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Anyone who looks at a website&#8217;s data does it constantly, often without noticing: they spot that two things seem to move together. Pages that sit higher in the SERP get more clicks; the ones where users linger longer convert more; longer articles appear to rank better. These are valuable hunches, but they stay vague until we answer a precise question: <em>how much</em> do these pairs of numbers move together? And in what sense? We need an index that turns the impression &#8220;they go hand in hand&#8221; into a comparable measure. That index is <strong>correlation</strong>, and it is one of the most used — and most misunderstood — tools in all of applied statistics.</p>



<span id="more-3821"></span>



<p class="wp-block-paragraph">Let&#8217;s say right away what correlation is <em>not</em>, because this is where the trouble starts. Correlation measures whether and how much two variables are associated; it does not say that one causes the other, and it does not build a model to predict one from the other. That second step — prediction — is the job of regression, which we&#8217;ll cover separately. Here we stay on the previous rung: understanding, with a single number, whether two metrics travel together.</p>



<h2 class="wp-block-heading">From Covariance to Correlation</h2>



<p class="wp-block-paragraph">The starting idea is simple. If two variables move together, when one sits above its own mean the other tends to sit above its own too; when one drops below, the other follows. We can measure this tendency by multiplying, for each observation, the deviation of <em>x</em> from its mean by the deviation of <em>y</em> from its, and averaging the result. This is the <strong>covariance</strong>:</p>



\( \text{cov}(x, y) = \frac{1}{n} \sum_{i=1}^{n} (x_i &#8211; \bar{x})(y_i &#8211; \bar{y}) \\ \)



<p class="wp-block-paragraph">where <em>x̄</em> and <em>ȳ</em> are the means of the two variables and <em>n</em> the number of observations. When the deviations share the same sign (both above or both below the mean) the product is positive; when they have opposite signs it is negative. A positive covariance thus signals that the two variables tend to grow together, a negative one that when one rises the other falls.</p>



<p class="wp-block-paragraph">Covariance, however, has a flaw that makes it useless as a yardstick: <strong>it depends on the units of measurement</strong>. The covariance between sessions and seconds-on-page is one number, the one between sessions and conversion rate another, and the two can&#8217;t be compared because they speak different languages. To get a clean measure we divide it by the two standard deviations, stripping it of units and forcing it into a fixed range. The result is the <strong>Pearson correlation coefficient</strong>:</p>



\( r = \frac{\sum_{i=1}^{n} (x_i &#8211; \bar{x})(y_i &#8211; \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i &#8211; \bar{x})^2} \; \sqrt{\sum_{i=1}^{n} (y_i &#8211; \bar{y})^2}} \\ \)



<p class="wp-block-paragraph">The numerator is nothing but the covariance (up to the factor <em>n</em>); the denominator is the product of the two spreads, and serves precisely to normalise. The result is a pure number between <strong>−1 and +1</strong>: it equals +1 when the points lie exactly on a rising line, −1 when they lie on a falling line, 0 when there is no linear association at all. The closer <em>r</em> gets to the extremes, the tighter the linear relationship.</p>



<h2 class="wp-block-heading">Pearson: Linear Association (and Its Trap)</h2>



<p class="wp-block-paragraph">Let&#8217;s put it straight to work on a case every SEO knows by heart: the link between <strong>SERP position</strong> and <strong>CTR</strong>, the click-through rate. We all know that the further down the results page you go, the fewer clicks you get. Let&#8217;s take ten positions with their observed CTRs and compute Pearson&#8217;s coefficient in R:</p>



<pre class="wp-block-code"><code>pos &lt;- 1:10
ctr &lt;- c(28.5, 15.7, 11.0, 7.2, 8.0, 5.1, 4.0, 3.2, 2.8, 2.6)  # CTR % by position

cor(pos, ctr)
# [1] -0.852</code></pre>



<p class="wp-block-paragraph">The coefficient is <strong>−0.852</strong>: strong, negative, exactly as we expected. And yet something doesn&#8217;t add up. The link between position and CTR is iron-clad — it almost never happens that a lower position yields more clicks — and we&#8217;d expect a value even closer to −1. Why does Pearson stop at −0.85?</p>



<p class="wp-block-paragraph">The answer is the most important point in the whole article. <strong>Pearson measures only the linear association</strong>, that is, how well the points line up along a <em>straight line</em>. But the CTR curve is not a straight line: it plummets from the first to the third position and then flattens out. The relationship is very strong, it&#8217;s just <em>curved</em>. Pearson, which looks for straight lines, reads that curvature as &#8220;imperfection&#8221; and lowers the grade. It isn&#8217;t wrong: it&#8217;s answering a question — &#8220;how linear is this?&#8221; — that in this case isn&#8217;t the right one.</p>



<h2 class="wp-block-heading">Spearman and Kendall: Monotonic Association</h2>



<p class="wp-block-paragraph">For many SEO relationships we care about something weaker than linearity: it&#8217;s enough to know whether, as one variable grows, the other grows <em>systematically</em> (or falls systematically), without insisting it does so at a constant pace. A relationship like this is called <strong>monotonic</strong>, and to measure it there&#8217;s <strong>Spearman&#8217;s</strong> rank correlation coefficient, denoted ρ (rho).</p>



<p class="wp-block-paragraph">Spearman&#8217;s trick is elegant: instead of working on the values, it works on their <strong>ranks</strong>. It replaces each number with its place in the standings (the smallest becomes 1, the next 2, and so on) and then computes an ordinary Pearson on these ranks. This way the exact shape of the curve disappears — only the order matters — and what remains is how faithfully the order of <em>x</em> reproduces that of <em>y</em>. We compute it on the same data as before:</p>



<pre class="wp-block-code"><code>cor(pos, ctr, method = "spearman")
# [1] -0.988</code></pre>



<p class="wp-block-paragraph">Now the coefficient is <strong>−0.988</strong>, pressed up against −1. It&#8217;s the correct picture of the situation: as the position worsens, the CTR falls almost without exception. (That &#8220;almost&#8221; is no accident: in the data I left a small, realistic inversion, position 5 yielding more than position 4, as happens when a rich snippet inflates a result&#8217;s CTR; it&#8217;s exactly the kind of ripple that keeps ρ from reaching an exact −1.) Where Pearson saw a &#8220;good but not great&#8221; association, Spearman recognises the near-perfect monotonic relationship that is actually there.</p>



<p class="wp-block-paragraph">There&#8217;s a third measure worth knowing, <strong>Kendall&#8217;s tau</strong> (τ). It too works on order, but with a different logic: across all pairs of observations, it counts how many are <em>concordant</em> (if <em>x</em> rises, <em>y</em> rises too) and how many <em>discordant</em>, then takes the balance. I compute it in R, again on the same data:</p>



<pre class="wp-block-code"><code>cor(pos, ctr, method = "kendall")
# [1] -0.956</code></pre>



<p class="wp-block-paragraph">Kendall returns <strong>−0.956</strong>, also close to the extremes but typically a touch more conservative than Spearman. In everyday practice the choice is less complicated than it seems: <strong>Pearson</strong> when we care about a linear relationship and the data have no violent tails or outliers; <strong>Spearman</strong> when the relationship is monotonic but curved, or when the data are already ranks (positions, standings), or when a couple of outliers might throw Pearson off; <strong>Kendall</strong> when the observations are few or there are many ties, a situation in which its statistical properties hold up better.</p>



<h2 class="wp-block-heading">The Correlation Matrix</h2>



<p class="wp-block-paragraph">We rarely have only two metrics to compare. More often we have a handful — sessions, average duration, conversions, bounce rate — and we&#8217;d like to see <em>all</em> the associations at a glance. R&#8217;s <code>cor()</code> function, applied to an entire data frame, returns the <strong>correlation matrix</strong>: the coefficient of each variable with every other. I build it on twelve example pages:</p>



<pre class="wp-block-code"><code>ga4 &lt;- data.frame(
  sessions      = c(120, 340, 210, 560, 430, 780, 650, 290, 510, 880, 360, 720),
  avg_duration  = c(31,  55,  48,  44,  58,  63,  71,  52,  46,  68,  60,  64),
  conversions   = c(3,   8,   4,   21,  11,  24,  19,  9,   17,  29,  7,   22),
  bounce_rate   = c(70,  61,  66,  44,  57,  41,  46,  59,  52,  38,  63,  45)
)

round(cor(ga4), 2)
#              sessions avg_duration conversions bounce_rate
# sessions         1.00         0.73        0.98       -0.97
# avg_duration     0.73         1.00        0.58       -0.62
# conversions      0.98         0.58        1.00       -0.99
# bounce_rate     -0.97        -0.62       -0.99        1.00</code></pre>



<p class="wp-block-paragraph">It reads like a two-way table: the diagonal is all 1s (every variable is perfectly correlated with itself), and the matrix is symmetric because the correlation of <em>x</em> with <em>y</em> is the same as <em>y</em> with <em>x</em>. As we can see, sessions and conversions travel almost in unison (0.98: more traffic, more conversions — no surprise), bounce rate is negatively correlated with everything else, while average duration associates with conversions far less than intuition would suggest (0.58). A matrix like this is a precious starting map for deciding where to look. It helps to visualise it as a <strong>heatmap</strong> (with packages such as <code>corrplot</code>), where colour intensity makes the strong links jump out.</p>



<p class="wp-block-paragraph">One warning, though, belongs here in bold, because it&#8217;s the heart of the matter: <strong>a correlation matrix is not a causal map</strong>. It tells us which numbers move together, not which moves which, nor whether what moves them is a third factor we don&#8217;t even have in the table.</p>



<h2 class="wp-block-heading">Correlation Is Not Causation</h2>



<p class="wp-block-paragraph">It&#8217;s the most repeated phrase in statistics, and the most ignored in practice. It&#8217;s worth seeing where it trips us up, because in SEO the stumble is a daily one. Take the classic observation: longer articles rank better. Let&#8217;s measure the association between content length and a ranking score (higher = better placed):</p>



<pre class="wp-block-code"><code>length     &lt;- c(620, 850, 1100, 1300, 1500, 1800, 2100, 2400, 2800, 3200)
rank_score &lt;- c(3,   8,   6,    11,   9,    7,    14,   10,   16,   15)

cor(length, rank_score)
# [1] 0.842</code></pre>



<p class="wp-block-paragraph">A fine <strong>0.842</strong>: the correlation is there, and it&#8217;s robust. The temptation to conclude &#8220;I&#8217;ll lengthen my articles and climb the rankings&#8221; is overwhelming — and almost always wrong. Faced with a correlation, before talking about cause we must put at least three alternative explanations on the table. It could be a <strong>direct cause</strong> (length genuinely helps ranking). It could be <strong>reverse causation</strong> (pages that already rank well get more care and are expanded over time). Or — the most frequent and most insidious case — there could be a <strong>confounding factor</strong> moving both: the site&#8217;s authority. An authoritative domain tends both to produce deeper (hence longer) content and to rank better (for reasons that have nothing to do with length). Length and ranking rise together not because one causes the other, but because a third element drags them both.</p>



<p class="wp-block-paragraph">This hidden third element is the root of some of the most spectacular errors in data analysis: it can even flip the sign of a relationship when the data are aggregated the wrong way, the phenomenon known as <a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s paradox</a>. Establishing a causal link is a craft of its own, requiring controlled experiments or dedicated techniques; correlation, on its own, will never get there. Its job is a different one, and a valuable one: flagging the pairs of metrics worth investigating more deeply.</p>



<h2 class="wp-block-heading">Try It Yourself</h2>



<p class="wp-block-paragraph">To lock in the mechanism, here&#8217;s an exercise with realistic data. For ten pages we have the number of referring domains linking to them and their monthly organic traffic, and we want to understand how strongly the two are associated:</p>



<pre class="wp-block-code"><code>bl  &lt;- c(5, 12, 8, 25, 18, 40, 33, 60, 52, 95)        # referring domains
org &lt;- c(180, 240, 420, 510, 760, 690, 1250, 1100, 1900, 1650)  # organic sessions/month</code></pre>



<p class="wp-block-paragraph">The task: compute both Pearson&#8217;s coefficient with <code>cor(bl, org)</code> and Spearman&#8217;s with <code>cor(bl, org, method = "spearman")</code>, and reflect on why they differ.</p>



<p class="wp-block-paragraph">To check your work: Pearson is <strong>0.815</strong> and Spearman <strong>0.855</strong>. Both are high and tell the same underlying story — more referring domains, more traffic — but the fact that Spearman is a bit higher than Pearson tells us something: the relationship is more <em>monotonic</em> than <em>linear</em>, a sign that beyond a certain threshold each extra link brings less marginal traffic than the straight line would want. And, of course, neither number entitles us to say that buying backlinks <em>will</em> raise traffic: here too the site&#8217;s authority might be moving both things together.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="wp-block-paragraph">With correlation we&#8217;ve learned to answer the question of <em>whether, and how much, two metrics are associated</em> — choosing Pearson, Spearman or Kendall each time depending on the shape of the link. It&#8217;s the indispensable rung before the next question, the one anyone analysing data eventually asks: given an association, can I use one variable to <em>predict</em> the other, and draw the line that ties them together? From here on we no longer just measure the strength of a link, we model it: this is the territory of <a href="https://www.gironi.it/blog/en/correlation-and-regression-analysis-linear-regression/">linear regression</a>, where the very coefficient <em>r</em> we&#8217;ve just met returns to the stage, this time in the service of prediction.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading">Further Reading</h3>



<p class="wp-block-paragraph">On correlation, causation and the art of not confusing the two, the book I recommend most often is <a href="https://www.amazon.it/dp/0241258766?tag=consulenzeinf-21" rel="nofollow sponsored noopener" target="_blank"><em>The Art of Statistics</em></a> by David Spiegelhalter: it walks through real cases where an association does — and does not — imply a cause, with exactly the clarity that anyone coming from applications needs.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/correlation/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)</title>
		<link>https://www.gironi.it/blog/en/effect-size-and-power-analysis/</link>
					<comments>https://www.gironi.it/blog/en/effect-size-and-power-analysis/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Wed, 17 Jun 2026 07:45:42 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3693</guid>

					<description><![CDATA[We closed the article on the A/B test significance calculator with a promise. We said that the p-value answers a single question — does the effect exist? — and that, on its own, it adds nothing else. It does not tell us how large the effect is, nor whether it is worth the effort of &#8230; <a href="https://www.gironi.it/blog/en/effect-size-and-power-analysis/" class="more-link">Continue reading<span class="screen-reader-text"> "Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">We closed the article on the <a href="https://www.gironi.it/blog/en/ab-test-significance-calculator/">A/B test significance calculator</a> with a promise. We said that the p-value answers a single question — <em>does the effect exist?</em> — and that, on its own, it adds nothing else. It does not tell us how large the effect is, nor whether it is worth the effort of shipping it. It is time to keep that promise, because the two questions the p-value leaves hanging are exactly what separates reading data with method from stopping at the first threshold that glitters.</p>



<p class="wp-block-paragraph">The two questions have precise names. The first — <em>how big is it?</em> — is the <strong>effect size</strong>. The second — <em>with the data I have, could I even have seen an effect like this?</em> — is the <strong>power</strong> of the test, and the reasoning that gets us to an answer is called <strong>power analysis</strong>. We examine them one at a time, as always with an example at hand.</p>



<span id="more-3693"></span>



<h2 class="wp-block-heading">Significant Doesn&#8217;t Mean Large</h2>



<p class="wp-block-paragraph">Let&#8217;s start with a situation that comes up more often than people running online tests would like. Suppose we tried two title tags on a very high-traffic page and collected one million sessions per variant. Variant A has a CTR of 3.00%, variant B of 3.05%: five hundredths of a percentage point of difference. Let&#8217;s check in R whether the gap is statistically significant:</p>



<pre class="wp-block-code"><code># one million sessions per variant, CTR 3.00% vs 3.05%
prop.test(c(30000, 30500), c(1000000, 1000000), correct = FALSE)$p.value
# [1] 0.03899</code></pre>



<p class="wp-block-paragraph">The p-value is 0.039, below the 0.05 threshold. By the book, we should celebrate: the difference is &#8220;significant&#8221;. But let&#8217;s pause. Are we really about to rewrite the titles across the whole site to gain five hundredths of a point of CTR? That significant result hides an effect of laughable size, made detectable only by the sheer mass of data.</p>



<p class="wp-block-paragraph"><strong>This is the point of no return</strong>: with a large enough sample, <em>any</em> difference becomes statistically significant, even the most trivial one. The p-value measures how confident we are that the effect isn&#8217;t zero; it does not measure how large the effect is. They are two different things, and conflating them is the mistake that leads to chasing wins that leave no trace on revenue. Effect size exists precisely to put magnitude back at the centre.</p>



<h2 class="wp-block-heading">Effect Size: Measuring the &#8220;How Much&#8221;</h2>



<p class="wp-block-paragraph">The idea behind effect size is simple and, once seen, hard to forget: instead of asking only <em>whether</em> two groups differ, we measure <em>by how much</em> they differ, on a scale that does not depend on sample size. It is the difference between saying &#8220;B beats A&#8221; and saying &#8220;B beats A by half a standard deviation&#8221;. The first is news; the second is information you can decide on.</p>



<p class="wp-block-paragraph">There are several effect-size measures, each tailored to a type of comparison. We look closely at two — one for means, one for proportions — because they cover most of the everyday work; the others we mention briefly at the end, with the right pointers.</p>



<h2 class="wp-block-heading">Cohen&#8217;s d: the Effect Between Two Means</h2>



<p class="wp-block-paragraph">When we compare two means — the average time on page of two variants, the average session duration of two segments — the reference measure is <strong>Cohen&#8217;s d</strong>. The intuition is this: we take the difference between the two means and express it in &#8220;standard-deviation units&#8221;, so it becomes comparable across different contexts. A three-second difference weighs a lot if sessions all hover around that value, and almost nothing if they vary by minutes.</p>



<p class="wp-block-paragraph">In formula, Cohen&#8217;s d is the ratio between the difference of the means and the combined standard deviation of the two groups:</p>



\( d = \frac{\bar{x}_B &#8211; \bar{x}_A}{s_p} \\ \)



<p class="wp-block-paragraph">where <em>x̄</em><sub>A</sub> and <em>x̄</em><sub>B</sub> are the group means and <em>s</em><sub>p</sub> is the <strong>pooled standard deviation</strong>, a weighted average of the two standard deviations that brings together the internal variability of both groups:</p>



\( s_p = \sqrt{\frac{(n_A &#8211; 1)\,s_A^2 + (n_B &#8211; 1)\,s_B^2}{n_A + n_B &#8211; 2}} \\ \)



<p class="wp-block-paragraph">with <em>n</em><sub>A</sub>, <em>n</em><sub>B</sub> the sample sizes and <em>s</em><sub>A</sub>, <em>s</em><sub>B</sub> the standard deviations of the two groups. The denominator is nothing more than the correct way to fuse two variabilities into a single reference measure.</p>



<p class="wp-block-paragraph">Let&#8217;s do an example. We measured session duration (in seconds) on two versions of a page, twelve sessions per version. I compute Cohen&#8217;s d in R using the <code>effsize</code> package, which does the maths and also returns the qualitative label:</p>



<pre class="wp-block-code"><code>A &lt;- c(48, 55, 52, 60, 46, 58, 51, 57, 49, 54, 53, 50)  # version A
B &lt;- c(50, 58, 52, 62, 49, 57, 60, 53, 61, 51, 59, 54)  # version B

library(effsize)
cohen.d(B, A)

# Cohen's d
#
# d estimate: 0.6254922 (medium)
# 95 percent confidence interval:
#      lower      upper
# -0.2416187  1.4926030</code></pre>



<p class="wp-block-paragraph">The estimated d is <strong>0.63</strong>, which <code>effsize</code> classifies as a <strong>medium</strong> effect. The conventional thresholds, proposed by Jacob Cohen, are 0.2 for a small effect, 0.5 for a medium one, 0.8 for a large one — but they should be taken for what they are: useful conventions to get oriented, not laws of nature. Cohen himself recommended interpreting them in light of one&#8217;s own field, not applying them blindly. <em>In everyday SEO practice</em>, a d of 0.63 on session duration is a change worth taking seriously.</p>



<p class="wp-block-paragraph">There is, however, a detail worth the whole rest of the article, and it is already visible above: the confidence interval of d runs from −0.24 to 1.49. It crosses zero. In other words, with just twelve sessions per group, the <em>estimated</em> effect is medium, but the data are not enough to rule out that the <em>true</em> one is null. And indeed, if we feed the same numbers to a t-test, we find anything but a reassuring p-value:</p>



<pre class="wp-block-code"><code>t.test(B, A)
#
# 	Welch Two Sample t-test
# t = 1.5321, df = 21.9, p-value = 0.1398</code></pre>



<p class="wp-block-paragraph">A medium effect that the test declares <em>not</em> significant. This is not a contradiction: it is exactly the phenomenon that the power of a test exists to explain. Let&#8217;s hold that thought, we come back to it shortly.</p>



<h2 class="wp-block-heading">Effect Size for Proportions (CTR and Conversions)</h2>



<p class="wp-block-paragraph">Time on page is a mean, but the daily bread of anyone doing SEO is proportions: CTR, conversion rate, bounce rate. Here Cohen&#8217;s d does not apply directly, and the natural effect-size measure is <strong>Cohen&#8217;s h</strong>, built specifically for the difference between two proportions.</p>



<p class="wp-block-paragraph">The technical detail that makes it reliable is a transformation — the arcsine of the square root of the proportion — that serves to stabilise the variability (in a proportion, variability depends on the value itself, and is greatest around 50%). The formula is:</p>



\( h = 2\arcsin\sqrt{p_2} &#8211; 2\arcsin\sqrt{p_1} \\ \)



<p class="wp-block-paragraph">where <em>p</em><sub>1</sub> and <em>p</em><sub>2</sub> are the two proportions compared. There is no need to compute it by hand: the <code>ES.h</code> function of the <code>pwr</code> package gives it to us. But before seeing it at work it is worth introducing the other half of the story, because that is where Cohen&#8217;s h shines.</p>



<p class="wp-block-paragraph">First, though, let&#8217;s close the effect-size chapter with an honest mention of the other measures. When the groups compared are more than two — the classic ANOVA scenario — the typical measure is <strong>eta squared</strong> (η²), which tells what fraction of the total variability is explained by the factor under study; we laid its foundations when discussing the <a href="https://www.gironi.it/blog/en/analysis-of-variance-anova-explained-simply/">analysis of variance</a>. When instead the outcome is binary — converts / does not convert — effect size is often expressed as an <strong>odds ratio</strong>, the ratio between the odds of success, the same object that governs <a href="https://www.gironi.it/blog/en/logistic-regression-predicting-the-outcome-of-an-event/">logistic regression</a>. Different tools for different questions, but the underlying idea does not change: put a number on the magnitude, not just on the existence.</p>



<h2 class="wp-block-heading">The Power of a Test: Could We Have Seen It?</h2>



<p class="wp-block-paragraph">Let&#8217;s go back to our medium effect declared not significant. How can a d of 0.63 produce a p-value of 0.14? The answer lies in a concept that closes the inferential circle: the <strong>power</strong> of a test.</p>



<p class="wp-block-paragraph">When we run a hypothesis test we risk two kinds of error. The first, the type I error, is crying out for an effect that isn&#8217;t there: we keep it under control with the threshold α (usually 0.05). The second, the type II error, is its opposite and far more insidious: <em>failing to see</em> an effect that is in fact there. The probability of committing it is denoted by β, and <strong>power</strong> is its complement:</p>



\( \text{power} = 1 &#8211; \beta \\ \)



<p class="wp-block-paragraph">Put more plainly, power is the probability of noticing a real effect when it truly exists. A power of 0.80 — the standard people aim for — means that, if the effect exists at the hypothesised size, our test detects it four times out of five.</p>



<p class="wp-block-paragraph">The crucial point is that power, the threshold α, effect size and sample size are not four independent knobs: they are <strong>bound by a constraint</strong>. Fix three of these values, and the fourth is determined. This is the entire idea of power analysis, and it is what makes it so useful: depending on which unknown we leave free, it answers two different operational questions.</p>



<p class="wp-block-paragraph">And here is why our medium effect stayed invisible. With twelve sessions per group the power of the test was minuscule: the test was, quite simply, <em>blind</em>. A non-significant result, under these conditions, does not say &#8220;the effect isn&#8217;t there&#8221;; it says &#8220;I didn&#8217;t have good enough eyes to see it&#8221;. Confusing the two is one of the most expensive mistakes you can make reading an A/B test.</p>



<h2 class="wp-block-heading">Power Analysis in R: How Much Data You Need</h2>



<p class="wp-block-paragraph">The first question power analysis can settle is the one every test should face <em>before</em> starting: how much data do I need? Let&#8217;s pick up our medium effect again. If we wanted to design a test able to detect a d of 0.63 with power 0.80 and threshold 0.05, I compute in R with the <code>pwr</code> package:</p>



<pre class="wp-block-code"><code>library(pwr)
pwr.t.test(d = 0.63, sig.level = 0.05, power = 0.80, type = "two.sample")
#
#      Two-sample t test power calculation
#               n = 40.53396
#               d = 0.63
#       sig.level = 0.05
#           power = 0.8
#     alternative = two.sided
# NOTE: n is number in *each* group</code></pre>



<p class="wp-block-paragraph">We would need about <strong>41 sessions per group</strong>, not twelve. That is why our test was mute: it was looking for a medium effect with a third of the data required. Power analysis, done <em>upstream</em>, would have spared us an inconclusive test — and it is exactly the reasoning behind the <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a>: sample size and power are two sides of the same coin.</p>



<p class="wp-block-paragraph">The second question is the mirror image and comes up <em>after the fact</em>, once the test is done: with the data I had, how much power did I really have? We see it better on a concrete case.</p>



<h2 class="wp-block-heading">A Practical Case: the A/B Test That &#8220;Didn&#8217;t Work&#8221;</h2>



<p class="wp-block-paragraph">Suppose we tested two landing pages. A converted 60 visitors out of 1,500 (4.0%), B converted 78 out of 1,500 (5.2%). At a glance B looks clearly better — a point and two tenths of conversion more is not nothing. Let&#8217;s check in R whether the difference holds:</p>



<pre class="wp-block-code"><code>prop.test(c(60, 78), c(1500, 1500), correct = FALSE)
#
# 	2-sample test for equality of proportions
# X-squared = 2.461, df = 1, p-value = 0.1167</code></pre>



<p class="wp-block-paragraph">The p-value is 0.117: above 0.05. By-the-book verdict: difference not significant, test failed, file it away. But now we know better than to stop here. Let&#8217;s compute the power that test actually had, starting from the observed effect size:</p>



<pre class="wp-block-code"><code>library(pwr)
h &lt;- ES.h(0.052, 0.040)   # Cohen's h between the two proportions
h
# [1] 0.0574024

pwr.2p.test(h = h, n = 1500, sig.level = 0.05)
#               power = 0.3492384</code></pre>



<p class="wp-block-paragraph">Power was <strong>0.35</strong>. In other words: even if B had genuinely been better by that much, we had a little over one chance in three of noticing it. The test did not &#8220;prove the two pages are equal&#8221;: it was simply too weak to rule. And how much data would have been needed to reach decent power?</p>



<pre class="wp-block-code"><code>pwr.2p.test(h = h, power = 0.80, sig.level = 0.05)
#               n = 4764.053</code></pre>



<p class="wp-block-paragraph">Almost <strong>4,800 visitors per variant</strong>, against the 1,500 we had. The difference between a test that &#8220;didn&#8217;t work&#8221; and a test never really in a position to work is all here — and you only see it if you pair power with effect size. <strong>Beware</strong>, then, of downgrading a non-significant result to &#8220;no effect&#8221;: almost always we are merely looking at an underpowered test.</p>



<h2 class="wp-block-heading">Try It Yourself</h2>



<p class="wp-block-paragraph">To make the mechanism stick, here is an exercise with realistic data. We are designing an A/B test on a contact form. The current conversion rate (baseline) is <strong>2.5%</strong>, and we would count it a success to bring it to <strong>3.0%</strong>: half a point of improvement. We want a test with power 0.80 and threshold 0.05.</p>



<p class="wp-block-paragraph">The task: compute the effect size with <code>ES.h(0.030, 0.025)</code>, pass it to <code>pwr.2p.test</code> setting <code>power = 0.80</code>, and read off how many visitors per variant are needed. Then, as a cross-check, compute the power we would have if we stopped at 3,000 visitors per variant with <code>pwr.2p.test(h = ..., n = 3000, ...)</code>.</p>



<p class="wp-block-paragraph">To check your work: the effect size is <em>h</em> = 0.031, about <strong>16,759 visitors per variant</strong> are needed for a power of 0.80, and with only 3,000 the power would collapse to <strong>0.22</strong>. The moral is the one we now know: the smaller the effect we are chasing, the more data we need to see it — halving the minimum detectable difference does not double the sample required, it quadruples it.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="wp-block-paragraph">Effect size and power complete the triad that the p-value, on its own, left unfinished: no longer just <em>does the effect exist?</em>, but also <em>how big is it?</em> and <em>could I have seen it?</em>. These are the three questions that turn a test from a propitiatory rite into a decision tool. And all three, on closer inspection, depend on a choice that comes <em>before</em> the test: how much data to collect, and how. That is the terrain of experimental design and <a href="https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/">sampling</a> — the point where statistics stops merely judging the numbers we put in front of it and begins to tell us which numbers to go and look for.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading">Further Reading</h3>



<p class="wp-block-paragraph">On the rigorous use of effect size, power and sizing in the context of online experiments, the most complete reference remains <a href="https://www.amazon.it/dp/1108724264?tag=consulenzeinf-21&#038;ascsubtag=effect-size-and-power-analysis" rel="nofollow sponsored noopener" target="_blank"><em>Trustworthy Online Controlled Experiments</em></a> by Ron Kohavi, Diane Tang and Ya Xu: the chapters on how to size a test and interpret its results are worth the purchase on their own. For an accessible take on the statistical reasoning behind these ideas — uncertainty, error, inference — <a href="https://www.amazon.it/dp/0241258766?tag=consulenzeinf-21&#038;ascsubtag=effect-size-and-power-analysis" rel="nofollow sponsored noopener" target="_blank"><em>The Art of Statistics</em></a> by David Spiegelhalter is hard to beat.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/effect-size-and-power-analysis/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>A/B Test Significance Calculator</title>
		<link>https://www.gironi.it/blog/en/ab-test-significance-calculator/</link>
					<comments>https://www.gironi.it/blog/en/ab-test-significance-calculator/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Fri, 12 Jun 2026 19:47:32 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3679</guid>

					<description><![CDATA[Our A/B test has run its course: variant B shows a higher conversion rate than variant A. The temptation to declare a winner and ship the change is strong. But first there is a question to answer, the same one that runs through this whole series: is the difference we observe a real signal, or &#8230; <a href="https://www.gironi.it/blog/en/ab-test-significance-calculator/" class="more-link">Continue reading<span class="screen-reader-text"> "A/B Test Significance Calculator"</span></a>]]></description>
										<content:encoded><![CDATA[<p>Our A/B test has run its course: variant B shows a higher conversion rate than variant A. The temptation to declare a winner and ship the change is strong. But first there is a question to answer, the same one that runs through this whole series: <strong>is the difference we observe a real signal, or just statistical noise?</strong></p>
<p>This calculator is the natural complement of the <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a>: that one works <em>before</em> the test and tells us how many users we need; this one works <em>after</em> and tells us whether the result we obtained is statistically significant. If you have read the article on <a href="https://www.gironi.it/blog/en/hypothesis-testing-a-step-by-step-guide/">hypothesis testing</a>, you will recognise the machinery at once: behind the scenes sits a z-test for comparing two proportions.</p>
<p><span id="more-3679"></span></p>
<p>Using it is immediate: we enter visitors and conversions for the two variants, choose a significance level, and the calculator returns the p-value, a verdict, and the confidence interval of the difference.</p>
<div style="border: 1px solid #ccc;padding: 1.2em 1.5em;margin: 1.5em 0;border-radius: 6px">
<h3 style="margin-top: 0">Contents</h3>
<ul>
<li><a href="#calculator">The calculator</a></li>
<li><a href="#formula">The formula: how the calculation works</a></li>
<li><a href="#verify-r">Let&#8217;s verify it in R</a></li>
<li><a href="#interpret">How to read the result (without being fooled)</a></li>
<li><a href="#further">Further reading</a></li>
</ul>
</div>
<hr />
<h2 id="calculator">The calculator</h2>
<p>The preloaded values are the ones we will work through step by step below: replace them with the numbers from your own test.</p>
<style>
.sg-calc{max-width:620px;margin:2em auto;padding:1.5em 2em;background:#f8f8f8;border:1px solid #ddd;border-radius:8px;font-family:inherit}
.sg-calc h3{margin:0 0 1em;color:#333;font-size:1.2em}
.sg-calc fieldset{border:1px solid #ddd;border-radius:6px;margin:0 0 1em;padding:0.6em 1em 1em;background:#fff}
.sg-calc legend{font-weight:700;font-size:0.95em;color:#333;padding:0 0.4em}
.sg-calc label{display:block;margin:0.6em 0 0.3em;font-weight:600;color:#333;font-size:0.9em}
.sg-calc input[type=number],.sg-calc select{width:100%;padding:8px 10px;border:1px solid #ccc;border-radius:4px;font-size:1em;box-sizing:border-box;background:#fff}
.sg-calc input[type=number]:focus,.sg-calc select:focus{outline:none;border-color:#0073aa;box-shadow:0 0 0 2px rgba(0,115,170,0.15)}
.sg-calc .sg-row{display:flex;gap:1.2em}
.sg-calc .sg-col{flex:1}
.sg-calc .sg-result{margin-top:1.5em;padding:1.2em;background:#fff;border:2px solid #ccc;border-radius:6px;text-align:center}
.sg-calc .sg-result.sg-si{border-color:#2ecc71}
.sg-calc .sg-result.sg-no{border-color:#e67e22}
.sg-calc .sg-verdict{font-size:1.25em;font-weight:700;display:block;margin:0.2em 0;color:#333}
.sg-calc .sg-si .sg-verdict{color:#2ecc71}
.sg-calc .sg-no .sg-verdict{color:#e67e22}
.sg-calc .sg-pvalue{font-size:1.05em;color:#333;margin-top:0.4em}
.sg-calc .sg-detail{font-size:0.9em;color:#666;margin-top:0.6em;line-height:1.6}
.sg-calc .sg-warn{color:#e74c3c;font-size:0.85em;margin-top:0.5em;display:none}
@media(max-width:520px){.sg-calc .sg-row{flex-direction:column;gap:0}.sg-calc{padding:1em 1.2em}}
</style>
<div class="sg-calc" id="sgCalc">
<h3>Significance calculator</h3>
<div class="sg-row">
<div class="sg-col">
<fieldset>
<legend>Variant A (control)</legend>
<p><label for="sgNA">Visitors</label><br />
<input type="number" id="sgNA" value="8500" min="1" step="1"><br />
<label for="sgCA">Conversions</label><br />
<input type="number" id="sgCA" value="204" min="0" step="1"><br />
</fieldset>
</div>
<div class="sg-col">
<fieldset>
<legend>Variant B</legend>
<p><label for="sgNB">Visitors</label><br />
<input type="number" id="sgNB" value="8300" min="1" step="1"><br />
<label for="sgCB">Conversions</label><br />
<input type="number" id="sgCB" value="251" min="0" step="1"><br />
</fieldset>
</div>
</div>
<p><label for="sgAlpha">Significance (α)</label><br />
<select id="sgAlpha"><option value="0.01">0.01 (99%)</option><option value="0.05" selected>0.05 (95%)</option><option value="0.10">0.10 (90%)</option></select></p>
<div class="sg-result" id="sgResult">
<span class="sg-verdict" id="sgVerdict">—</span></p>
<div class="sg-pvalue" id="sgPvalue"></div>
<div class="sg-detail" id="sgDetail"></div>
</div>
<div class="sg-warn" id="sgWarn"></div>
</div>
<p><script>
(function(){
function normCdf(z) {
	const x = Math.abs(z) / Math.SQRT2;
	const t = 1 / (1 + 0.3275911 * x);
	const erf = 1 - (((((1.061405429 * t - 1.453152027) * t) + 1.421413741) * t - 0.284496736) * t + 0.254829592) * t * Math.exp(-x * x);
	const phi = 0.5 * (1 + erf);
	return z >= 0 ? phi : 1 - phi;
}
const Z975 = 1.959964; // qnorm(0.975), per l'IC al 95% come prop.test
function testSignificativita(nA, cA, nB, cB) {
	const interi = [nA, cA, nB, cB];
	if (interi.some(v => !Number.isFinite(v) || v < 0) || nA === 0 || nB === 0 || cA > nA || cB > nB) {
		return { valido: false, avvisi: [] };
	}
	const pA = cA / nA;
	const pB = cB / nB;
	const diff = pB - pA;
	const pooled = (cA + cB) / (nA + nB);
	const sePooled = Math.sqrt(pooled * (1 - pooled) * (1 / nA + 1 / nB));
	const z = sePooled > 0 ? diff / sePooled : 0;
	const pValue = sePooled > 0 ? 2 * (1 - normCdf(Math.abs(z))) : 1;
	const seDiff = Math.sqrt(pA * (1 - pA) / nA + pB * (1 - pB) / nB);
	const avvisi = [];
	for (const [c, n, nome] of [[cA, nA, 'A'], [cB, nB, 'B']]) {
		if (c < 5 || n - c < 5) {
			avvisi.push(`La variante ${nome} ha meno di 5 conversioni (o non-conversioni): l'approssimazione normale è poco affidabile con numeri così piccoli.`);
		}
	}
	return {
		valido: true,
		pA, pB, diff,
		lift: pA > 0 ? diff / pA : null,
		z, pValue,
		ciLow: diff - Z975 * seDiff,
		ciHigh: diff + Z975 * seDiff,
		significativo: (alpha) => pValue < alpha,
		avvisi,
	};
}
  var SG_LABELS = {
    sigYes: 'Statistically significant difference at %CONF%',
    sigNo: 'Difference not significant at %CONF%',
    lift: 'relative lift',
    ci: '95% CI of the difference:',
    pp: 'percentage points',
    invalid: 'Let\u2019s check the data: conversions cannot exceed visitors.',
    warnSmall: 'Warning: one variant has fewer than 5 conversions (or non-conversions): with numbers this small the normal approximation is unreliable.'
  };
  var L = SG_LABELS;
  function fmtP(p){ return p < 0.0001 ? '&lt; 0.0001' : p.toFixed(4); }
  function calc(){
    var nA=parseInt(document.getElementById('sgNA').value,10);
    var cA=parseInt(document.getElementById('sgCA').value,10);
    var nB=parseInt(document.getElementById('sgNB').value,10);
    var cB=parseInt(document.getElementById('sgCB').value,10);
    var alpha=parseFloat(document.getElementById('sgAlpha').value);
    var conf=Math.round((1-alpha)*100)+'%';
    var box=document.getElementById('sgResult');
    var warn=document.getElementById('sgWarn');
    warn.style.display='none'; warn.textContent='';
    var r=testSignificativita(nA,cA,nB,cB);
    if(!r.valido){
      box.className='sg-result';
      document.getElementById('sgVerdict').innerHTML='&mdash;';
      document.getElementById('sgPvalue').textContent='';
      document.getElementById('sgDetail').innerHTML=L.invalid;
      return;
    }
    var sig=r.significativo(alpha);
    box.className='sg-result '+(sig?'sg-si':'sg-no');
    document.getElementById('sgVerdict').textContent=(sig?L.sigYes:L.sigNo).replace('%CONF%',conf);
    document.getElementById('sgPvalue').innerHTML='p-value: <strong>'+fmtP(r.pValue)+'</strong> &nbsp;&middot;&nbsp; z = '+r.z.toFixed(3);
    var liftTxt=r.lift===null?'&mdash;':(r.lift>=0?'+':'')+(r.lift*100).toFixed(1)+'%';
    document.getElementById('sgDetail').innerHTML=
      'CR A: <strong>'+(r.pA*100).toFixed(2)+'%</strong> &nbsp;&middot;&nbsp; CR B: <strong>'+(r.pB*100).toFixed(2)+'%</strong> &nbsp;&middot;&nbsp; '+L.lift+': <strong>'+liftTxt+'</strong><br />'+
      L.ci+' [' + (r.ciLow*100).toFixed(2) + '; ' + (r.ciHigh*100).toFixed(2) + '] ' + L.pp;
    if(r.avvisi.length){
      warn.textContent=L.warnSmall;
      warn.style.display='block';
    }
  }
  ['sgNA','sgCA','sgNB','sgCB','sgAlpha'].forEach(function(id){
    document.getElementById(id).addEventListener('input',calc);
    document.getElementById(id).addEventListener('change',calc);
  });
  calc();
})();
</script></p>
<hr />
<h2 id="formula">The formula: how the calculation works</h2>
<p>The reasoning is the classic hypothesis-testing one. We start from the <strong>null hypothesis</strong>: the two variants convert at the same rate, and the observed difference is due to chance. Then we measure how &#8220;surprising&#8221; that difference would be if the null hypothesis were true: if it is too surprising, the null hypothesis does not hold.</p>
<p>There are three protagonists:</p>
<ul>
<li><strong>p&#770;<sub>A</sub></strong> and <strong>p&#770;<sub>B</sub></strong>: the observed conversion rates of the two variants (conversions divided by visitors).</li>
<li><strong>p&#770;</strong>: the <em>pooled</em> proportion, i.e. the overall conversion rate computed by combining the data from both variants. Why pooled? Under the null hypothesis the two proportions coincide, and the best estimate of that single proportion uses all the available data.</li>
<li><strong>n<sub>A</sub></strong> and <strong>n<sub>B</sub></strong>: the visitors of the two variants.</li>
</ul>
<p>The test statistic measures the observed difference in standard-error units:</p>
<p>\( z = \frac{\hat{p}_B &#8211; \hat{p}_A}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_A} + \frac{1}{n_B}\right)}} \)</p>
<p>The denominator is the <strong>standard error of the difference</strong>: it tells us how much the gap between the two rates would fluctuate if we repeated the test many times in a world where the variants are identical. The resulting z is read on the standard normal distribution: the <strong>p-value</strong> is the probability of observing a difference at least this extreme, in either direction, by pure chance. &#8220;In either direction&#8221; is not a footnote: the test is <strong>two-tailed</strong>, because before looking at the data we do not know whether B will do better or worse than A.</p>
<p>The reference values are always the same:</p>
<ul>
<li>|z| &gt; 1.645 &rarr; significant at 90%</li>
<li>|z| &gt; 1.96 &rarr; significant at 95%</li>
<li>|z| &gt; 2.576 &rarr; significant at 99%</li>
</ul>
<p><strong>Let&#8217;s work through an example</strong>, with the numbers preloaded in the calculator. Variant A received 8,500 visitors and 204 conversions; variant B 8,300 visitors and 251 conversions:</p>
<ul>
<li>p&#770;<sub>A</sub> = 204 / 8,500 = 0.0240 (2.40%)</li>
<li>p&#770;<sub>B</sub> = 251 / 8,300 = 0.0302 (3.02%) &mdash; a +26% relative lift</li>
<li>pooled p&#770; = (204 + 251) / (8,500 + 8,300) = 455 / 16,800 = 0.0271</li>
<li>standard error = &radic;[0.0271 &times; 0.9729 &times; (1/8,500 + 1/8,300)] = 0.00250</li>
<li>z = (0.0302 &minus; 0.0240) / 0.00250 = <strong>2.49</strong></li>
</ul>
<p>So: z = 2.49 clears the 1.96 threshold and the p-value is 0.0127. The difference is <strong>significant at 95%</strong> &mdash; but, as you can see, not at 99% (0.0127 &gt; 0.01). Same result, two different verdicts depending on how strict we chose to be: the significance level must be decided <em>before</em> looking at the data, not after.</p>
<hr />
<h2 id="verify-r">Let&#8217;s verify it in R</h2>
<p>I check the calculation in R with <code>prop.test</code>, switching off the continuity correction to stay aligned with the manual computation:</p>
<pre>prop.test(c(251, 204), c(8300, 8500), correct = FALSE)

	2-sample test for equality of proportions
	without continuity correction

data:  c(251, 204) out of c(8300, 8500)
X-squared = 6.2075, df = 1, p-value = 0.01272
alternative hypothesis: two.sided
95 percent confidence interval:
 0.001325762 0.011156166
sample estimates:
    prop 1     prop 2
0.03024096 0.02400000</pre>
<p>The numbers match: the p-value is the same as the manual calculation, and the X-squared statistic is simply our z squared (2.49&sup2; &asymp; 6.21 &mdash; the chi-square test on a 2&times;2 table and the z-test on two proportions are the same test). As a bonus, R hands us the <strong>confidence interval of the difference</strong>: between 0.13 and 1.12 percentage points. That is the most valuable piece of information of all, and here is why.</p>
<hr />
<h2 id="interpret">How to read the result (without being fooled)</h2>
<p><strong>Significant does not mean important.</strong> This must always be kept firmly in mind: with very large samples, even tiny, commercially irrelevant differences become statistically significant. Significance tells us the difference is not due to chance &mdash; not that it is <em>big</em>. To understand how big it is, we look at the confidence interval of the difference: in our example it runs from +0.13 to +1.12 percentage points. If even the lower bound justifies the effort of shipping the change, we can proceed with confidence; if the interval includes negligible values, the &#8220;significant&#8221; verdict alone is not enough.</p>
<p><strong>The p-value holds if the test stops when planned.</strong> The calculation assumes the sample size was fixed in advance (with the <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a>) and that the test stops there. Checking the results every day and stopping at the first p-value below 0.05 &mdash; the infamous <em>peeking</em> &mdash; dramatically inflates false positives: it is like flipping a coin until three heads come up in a row and declaring the coin rigged. We covered this in the <a href="https://www.gironi.it/blog/en/guide-to-statistical-tests-for-a-b-analysis/">guide to statistical tests for A/B analysis</a>.</p>
<p><strong>N.B.</strong>: the calculator uses a two-tailed test, the standard, prudent choice. One-tailed versions exist and &#8220;reward&#8221; a directional hypothesis with halved p-values, but they should be used only when the direction of the effect is genuinely known a priori &mdash; which, in everyday A/B testing practice, is almost never.</p>
<hr />
<h3 id="further">You might also like</h3>
<ul>
<li><a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">A/B Test Sample Size Calculator</a></li>
<li><a href="https://www.gironi.it/blog/en/guide-to-statistical-tests-for-a-b-analysis/">Guide to Statistical Tests for A/B Analysis</a></li>
<li><a href="https://www.gironi.it/blog/en/hypothesis-testing-a-step-by-step-guide/">Hypothesis Testing: a Step-by-Step Guide</a></li>
</ul>
<hr />
<p>The p-value answers a single question: <em>does the effect exist?</em> It does not tell us how large it is, nor whether it is worth shipping. For that we need two more tools &mdash; <strong>effect size</strong> and <strong>power analysis</strong> &mdash; and that is exactly where <a href="https://www.gironi.it/blog/en/effect-size-and-power-analysis/">this series is headed next</a>.</p>
<hr />
<h3>Further reading</h3>
<p>The most complete reference on running online experiments rigorously remains <a href="https://www.amazon.it/dp/1108724264?tag=consulenzeinf-21&#038;ascsubtag=ab-test-significance-calculator" rel="nofollow sponsored noopener" target="_blank"><em>Trustworthy Online Controlled Experiments</em></a> by Ron Kohavi, Diane Tang and Ya Xu: the chapter on the pitfalls of interpreting results (peeking included) is worth the price of the book on its own.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/ab-test-significance-calculator/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>The Statistics and SEO Library: the Books I Recommend (and Why)</title>
		<link>https://www.gironi.it/blog/en/statistics-seo-library/</link>
					<comments>https://www.gironi.it/blog/en/statistics-seo-library/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Thu, 11 Jun 2026 08:11:56 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[seo]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3652</guid>

					<description><![CDATA[There is a question that comes back, reliably, every time I publish an article along this path: &#8220;so, which book should I read to study these things?&#8221;. Until now I have answered one piece at a time, in the &#8220;Further Reading&#8221; section that closes each article. Here I do the reverse: I gather the whole &#8230; <a href="https://www.gironi.it/blog/en/statistics-seo-library/" class="more-link">Continue reading<span class="screen-reader-text"> "The Statistics and SEO Library: the Books I Recommend (and Why)"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">There is a question that comes back, reliably, every time I publish an article along this path: <em>&#8220;so, which book should I read to study these things?&#8221;</em>. Until now I have answered one piece at a time, in the &#8220;Further Reading&#8221; section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.</p>


<p class="wp-block-paragraph">This is not a ranking and not a catalogue: these are <strong>the books I actually use</strong>, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.</p>



<span id="more-3652"></span>



<p class="wp-block-paragraph">A note on transparency before we start: the links below are Amazon affiliate links. If you buy a book through them, the blog receives a small commission at no extra cost to you: it is the most painless way I have found to cover the server bills.</p>


<h2 class="wp-block-heading">Where to Start</h2>


<h3 class="wp-block-heading">The Art of Statistics — David Spiegelhalter</h3>


<p class="wp-block-paragraph">If I could keep only one, it would be this. <a href="https://www.amazon.it/dp/0241258766?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>The Art of Statistics</em></a> does not teach formulas: it teaches <strong>how to reason about data before trusting it</strong>, which is exactly the skill missing when someone reads a Search Console report and jumps to conclusions. Spiegelhalter — a Cambridge professor and a science communicator of rare clarity — builds every chapter around a real case: botched polls, misread medical statistics, the famous Berkeley admissions case (the same case I told when discussing <a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s Paradox</a>).</p>


<p class="wp-block-paragraph">I cite it practically everywhere on this blog: from <a href="https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/">sampling</a> to <a href="https://www.gironi.it/blog/en/confidence-intervals-what-they-are-how-to-calculate-them-and-what-they-do-not-mean/">confidence intervals</a>, by way of the <a href="https://www.gironi.it/blog/en/central-limit-theorem/">Central Limit Theorem</a>. You can read it without pen and paper, and re-read it with profit. (For Italian readers there is also an excellent Italian edition, <a href="https://www.amazon.it/dp/8806246623?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>L&#8217;arte della statistica</em></a>.)</p>


<h3 class="wp-block-heading">Finalmente ho capito la statistica — Maurizio De Pra</h3>


<p class="wp-block-paragraph">The title says it all (&#8220;statistics, I finally got it&#8221;). <a href="https://www.amazon.it/dp/8867319396?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Finalmente ho capito la statistica</em></a> (Italian edition) is the book for absolute beginners who want a gradual path, plenty of examples and a modest price. It covers the territory of <strong>probability distributions</strong> well — the ones this blog&#8217;s path takes from the <a href="https://www.gironi.it/blog/en/the-poisson-distribution/">Poisson</a> to the <a href="https://www.gironi.it/blog/en/the-beta-distribution-explained-simply/">Beta</a> — together with the foundations of <a href="https://www.gironi.it/blog/en/first-steps-into-the-world-of-probability-sample-space-events-permutations-and-combinations/">probabilistic reasoning</a>. It does not replace a textbook, but it does what a textbook cannot: it takes the fear away.</p>


<h2 class="wp-block-heading">When Data Lies</h2>


<h3 class="wp-block-heading">How to Lie with Statistics — Darrell Huff</h3>


<p class="wp-block-paragraph">Written in 1954 and never aged. <a href="https://www.amazon.it/dp/0140213007?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>How to Lie with Statistics</em></a> is the short, venomous catalogue of the tricks numbers can be made to play: biased samples, conveniently chosen averages, truncated chart axes, percentages stripped of their context. Huff wrote for newspaper readers; I recommend it to anyone reading SEO tool reports and vendor slide decks, where those very tricks are alive and well. If you have been through <a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s Paradox</a> you already know that aggregate data can lie: Huff completes the picture with all the other ways.</p>


<p class="wp-block-paragraph">You can read it in an afternoon, and from that afternoon on you never look at a chart the same way again. (Italian readers can find it as <a href="https://www.amazon.it/dp/8889479094?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Mentire con le statistiche</em></a>.)</p>


<h2 class="wp-block-heading">The Textbook for Getting Serious: Inference</h2>


<h3 class="wp-block-heading">Statistica — Newbold, Carlson, Thorne</h3>


<p class="wp-block-paragraph">Sooner or later the moment comes when popular science is not enough: you want the applicability conditions of a test, the complete formulas, the exercises to check you understood. <a href="https://www.amazon.it/dp/8891910651?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Statistica</em></a> by Newbold, Carlson and Thorne (Italian edition) is the reference university textbook for the whole of inference: <a href="https://www.gironi.it/blog/en/hypothesis-testing-a-step-by-step-guide/">hypothesis testing</a>, confidence intervals, chi-square, ANOVA — in practice, the theoretical backbone of my <a href="https://www.gironi.it/blog/en/guide-to-statistical-tests-for-a-b-analysis/">guide to statistical tests for A/B analysis</a>.</p>


<p class="wp-block-paragraph">Let me be frank: it is a university textbook, and it costs like one. But it is one of those books you buy once and consult for years.</p>


<h2 class="wp-block-heading">Regression, Time Series, Models</h2>


<h3 class="wp-block-heading">Introduzione all&#8217;econometria — Stock, Watson</h3>


<p class="wp-block-paragraph">The name may be intimidating (econometrics?), but the content is exactly what anyone needs to go beyond basic <a href="https://www.gironi.it/blog/en/correlation-and-regression-analysis-linear-regression/">linear regression</a>: multiple regression, omitted variables, diagnostics, <a href="https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/">time series</a>. <a href="https://www.amazon.it/dp/8891906190?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Introduzione all&#8217;econometria</em></a> by Stock and Watson (Italian edition; the English original is <em>Introduction to Econometrics</em>) has a quality that is rare in textbooks: a constant focus on the <strong>interpretation</strong> of results, not just their computation. Which is, after all, where the difference between a useful analysis and an exercise in style is decided.</p>


<h2 class="wp-block-heading">The (Fallible) Art of Prediction</h2>


<h3 class="wp-block-heading">The Signal and the Noise — Nate Silver</h3>


<p class="wp-block-paragraph">Anyone working with data sooner or later has to make a forecast — and an estimate of next quarter&#8217;s organic traffic is a forecast in every respect. <a href="https://www.amazon.it/dp/0141975652?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>The Signal and the Noise</em></a> tells the story of why predictions fail so often: too much faith in models, the temptation to mistake noise for signal, the inability to reason in probabilities. Silver — the man who called the 2012 US presidential election right in all fifty states — moves through poker, earthquakes, weather and finance, and along the way delivers the best narrative introduction to <a href="https://www.gironi.it/blog/en/bayesian-statistics-how-to-learn-from-data-one-step-at-a-time/">Bayesian reasoning</a> I know of. It is the popular companion to the <a href="https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/">time series</a> chapter: first you learn to build a forecast, then you learn to distrust it. (There is also an Italian edition: <a href="https://www.amazon.it/dp/8860443865?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Il segnale e il rumore</em></a>.)</p>


<h2 class="wp-block-heading">Online Experimentation</h2>


<h3 class="wp-block-heading">Trustworthy Online Controlled Experiments — Kohavi, Tang, Xu</h3>


<p class="wp-block-paragraph">On A/B testing there is simply no equivalent: <a href="https://www.amazon.it/dp/1108724264?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Trustworthy Online Controlled Experiments</em></a> is <strong>the</strong> book on the subject, written by the people who led experimentation at Microsoft, Google and LinkedIn. Inside is everything I have touched in these articles — sample size, test power, mistakes to avoid — plus ten years of real-world cases about what goes wrong in actual experiments. I also used it to build my <a href="https://www.gironi.it/blog/en/ab-test-sample-size-calculator/">sample size calculator</a>. Very readable.</p>


<h2 class="wp-block-heading">The Bayesian Path</h2>


<h3 class="wp-block-heading">Bayesian Statistics the Fun Way — Will Kurt</h3>


<p class="wp-block-paragraph"><a href="https://www.gironi.it/blog/en/bayesian-statistics-how-to-learn-from-data-one-step-at-a-time/">Bayesian statistics</a> has a reputation for being hard, and its textbooks do their best to confirm it. <a href="https://www.amazon.it/dp/1593279566?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Bayesian Statistics the Fun Way</em></a> does the opposite: Will Kurt explains priors, posteriors and Bayesian updating with examples taken from Star Wars and Lego bricks, and — something I particularly appreciate — uses R for the computational side, exactly as I do here. It is the right book for grasping the Bayesian logic (and the reason behind the <a href="https://www.gironi.it/blog/en/the-beta-distribution-explained-simply/">Beta distribution</a>) before tackling the formal theory.</p>


<h2 class="wp-block-heading">Towards Machine Learning</h2>


<h3 class="wp-block-heading">An Introduction to Statistical Learning — James, Witten, Hastie, Tibshirani</h3>


<p class="wp-block-paragraph">The contemporary classic of statistical learning, known to everyone as &#8220;ISL&#8221;. <a href="https://www.amazon.it/dp/1461471370?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>An Introduction to Statistical Learning</em></a> covers, with the right balance of intuition and formalism, the topics of the more advanced part of this path: logistic regression, <a href="https://www.gironi.it/blog/en/how-to-use-decision-trees-to-classify-data/">decision trees</a>, PCA, with hands-on labs in R. N.b.: the authors distribute the PDF for free from their website — the printed edition remains for those who, like me, prefer to annotate study books in pencil.</p>


<h3 class="wp-block-heading">Introduction to Machine Learning — Ethem Alpaydın</h3>


<p class="wp-block-paragraph">For those who want the theoretical foundations of machine learning — the ones that in a university course would come before the labs — <a href="https://www.amazon.it/dp/0262028182?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Introduction to Machine Learning</em></a> by Alpaydın is the reference I cited in my <a href="https://www.gironi.it/blog/en/understanding-the-basics-of-machine-learning-a-beginners-guide/">introductory guide to ML</a>. More formal than ISL: one to pick up after it, not instead of it.</p>


<h2 class="wp-block-heading">The Working Language: R</h2>


<h3 class="wp-block-heading">R for Data Science — Wickham, Çetinkaya-Rundel, Grolemund</h3>


<p class="wp-block-paragraph">There was an obvious gap on this shelf: R code shows up in nearly every article of this blog — from the <a href="https://www.gironi.it/blog/en/the-chi-square-test-goodness-of-fit-and-test-of-independence/">chi-square test</a> to <a href="https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/">time series</a> — but the book to learn the language from was missing. <a href="https://www.amazon.it/dp/1492097403?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>R for Data Science</em></a> (second edition) fills the gap: Hadley Wickham is the author of the tidyverse, the package ecosystem that made R modern, and the book teaches the whole workflow — import, tidy, transform, visualise, communicate — on real data, with no superfluous theory. Like ISL, it can be read for free on the authors&#8217; website: one more reason to have no excuses.</p>


<h2 class="wp-block-heading">Communicating Data</h2>


<h3 class="wp-block-heading">Storytelling with Data — Cole Nussbaumer Knaflic</h3>


<p class="wp-block-paragraph">The most rigorous analysis in the world is worth little if the person receiving it does not understand it — and in marketing an analysis almost always has to be told to someone: a client, a manager, a meeting. <a href="https://www.amazon.it/dp/1119002257?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Storytelling with Data</em></a> teaches how to turn the default charts of Excel and Looker Studio into clear messages: choosing the right chart, removing the ink that carries no information, directing attention where it matters, building a narrative around the number. Of the whole shelf it is probably the book that pays for itself fastest: you can apply it to your very next report. (There is also an Italian edition, <a href="https://www.amazon.it/dp/8850333846?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Data storytelling</em></a>, published by Apogeo.)</p>


<h2 class="wp-block-heading">A Niche Read</h2>


<h3 class="wp-block-heading">Monte Carlo Methods in Financial Engineering — Paul Glasserman</h3>


<p class="wp-block-paragraph">This is the most specialised book on the shelf, and I list it out of honesty towards anyone who has reached the <a href="https://www.gironi.it/blog/en/the-monte-carlo-method-explained-simply-with-real-world-applications/">Monte Carlo method</a> and wants to go all the way: <a href="https://www.amazon.it/dp/1441915753?tag=consulenzeinf-21&#038;ascsubtag=statistics-seo-library" rel="nofollow sponsored noopener" target="_blank"><em>Monte Carlo Methods in Financial Engineering</em></a> by Glasserman is the complete reference on simulation applied to finance. Not a beach read: it is the text you reach for when the others are no longer enough.</p>


<h2 class="wp-block-heading">The Library at a Glance</h2>


<p class="wp-block-paragraph">To get your bearings quickly, here is the complete shelf in table form:</p>


<figure class="wp-block-table"><table><thead><tr><th>Book</th><th>Who it&#8217;s for</th><th>Language</th></tr></thead><tbody><tr><td><em>The Art of Statistics</em> — Spiegelhalter</td><td>Everyone: the starting point</td><td>EN (also IT)</td></tr><tr><td><em>Finalmente ho capito la statistica</em> — De Pra</td><td>Absolute beginners, distributions</td><td>IT</td></tr><tr><td><em>How to Lie with Statistics</em> — Huff</td><td>Defending yourself from doctored numbers</td><td>EN (also IT)</td></tr><tr><td><em>Statistica</em> — Newbold, Carlson, Thorne</td><td>For rigour: inference and tests</td><td>IT</td></tr><tr><td><em>Introduzione all&#8217;econometria</em> — Stock, Watson</td><td>Regression and time series</td><td>IT (orig. EN)</td></tr><tr><td><em>The Signal and the Noise</em> — Silver</td><td>Why predictions fail</td><td>EN (also IT)</td></tr><tr><td><em>Trustworthy Online Controlled Experiments</em> — Kohavi et al.</td><td>A/B testing and experimentation</td><td>EN</td></tr><tr><td><em>Bayesian Statistics the Fun Way</em> — Kurt</td><td>The Bayesian approach, with R</td><td>EN</td></tr><tr><td><em>An Introduction to Statistical Learning</em> — James et al.</td><td>Practical machine learning, with R</td><td>EN</td></tr><tr><td><em>Introduction to Machine Learning</em> — Alpaydın</td><td>Theoretical foundations of ML</td><td>EN</td></tr><tr><td><em>R for Data Science</em> — Wickham et al.</td><td>Learning R, from raw data to charts</td><td>EN</td></tr><tr><td><em>Storytelling with Data</em> — Knaflic</td><td>Communicating data and reports</td><td>EN (also IT)</td></tr><tr><td><em>Monte Carlo Methods in Financial Engineering</em> — Glasserman</td><td>Advanced simulation</td><td>EN</td></tr></tbody></table></figure>


<p class="wp-block-paragraph">This shelf is not closed. As the blog&#8217;s path widens — the statistical paradoxes I have started to explore, the bootstrap, text analysis — the library will widen too, and this page will be updated accordingly. In the meantime, if one single recommendation had to suffice: start with Spiegelhalter, and let the articles on this blog be your gym.</p>

]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/statistics-seo-library/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie</title>
		<link>https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/</link>
					<comments>https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Wed, 27 May 2026 12:47:27 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[seo]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3579</guid>

					<description><![CDATA[It&#8217;s the last day of the month. We&#8217;re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site&#8217;s overall organic CTR has collapsed from 4.5% to 3.5%. Before writing the bad-news email and bracing ourselves to justify the &#8230; <a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/" class="more-link">Continue reading<span class="screen-reader-text"> "Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">It&#8217;s the last day of the month. We&#8217;re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: <strong>the site&#8217;s overall organic CTR has collapsed from 4.5% to 3.5%</strong>.</p>



<p class="wp-block-paragraph">Before writing the bad-news email and bracing ourselves to justify the drop, let&#8217;s do the right thing: disaggregate the data to understand <strong>where</strong> we&#8217;re losing ground. We look at performance by device and discover something seemingly impossible:</p>



<ul class="wp-block-list">
<li>CTR on <strong>Desktop</strong> rose from 5.0% to 5.5%.</li>



<li>CTR on <strong>Mobile</strong> rose from 2.0% to 2.5%.</li>
</ul>



<p class="wp-block-paragraph">We stare at the screen. How is it mathematically possible that performance improved everywhere, yet the overall total dropped by a full percentage point?</p>



<span id="more-3579"></span>



<p class="wp-block-paragraph">We haven&#8217;t broken Google Search Console, and we haven&#8217;t forgotten elementary-school arithmetic. We&#8217;ve simply just fallen victim to <strong>Simpson&#8217;s Paradox</strong>.</p>



<h2 class="wp-block-heading">What Is Simpson&#8217;s Paradox</h2>



<p class="wp-block-paragraph">Simpson&#8217;s Paradox is a statistical phenomenon in which a trend that appears clearly within several groups of data disappears — or even reverses — when the groups are combined into a single total.</p>



<p class="wp-block-paragraph">In the everyday practice of SEO and marketing, this almost always happens because of a hidden <strong>confounding variable</strong>: in our case, <strong>the relative weight of the segments we&#8217;re analyzing</strong>. It&#8217;s the same reasoning we meet when discussing <a href="https://www.gironi.it/blog/en/contingency-tables-and-conditional-probability/">conditional probability</a>, where what matters is not the marginal figure but the one conditioned on a subgroup.</p>



<p class="wp-block-paragraph">When we work with rates and percentages (CTR, conversion rate, bounce rate), looking at the aggregate figure without considering the underlying volumes is one of the most insidious traps for anyone analyzing data.</p>



<h2 class="wp-block-heading">The Proof: Anatomy of a Fake Collapse</h2>



<p class="wp-block-paragraph">Let&#8217;s go back to our monthly report and put the absolute numbers behind those percentages. Only then can we understand what really happened between Month 1 and Month 2.</p>



<figure class="wp-block-table"><table><thead><tr><th>Segment</th><th>Month 1 (impr. · clicks · CTR)</th><th>Month 2 (impr. · clicks · CTR)</th><th>Trend</th></tr></thead><tbody><tr><td><strong>Desktop</strong></td><td>10,000 · 500 · 5.0%</td><td>10,000 · 550 · 5.5%</td><td>rising</td></tr><tr><td><strong>Mobile</strong></td><td>2,000 · 40 · 2.0%</td><td>20,000 · 500 · 2.5%</td><td>rising</td></tr><tr><td><em>Aggregate total</em></td><td>12,000 · 540 · 4.5%</td><td>30,000 · 1,050 · 3.5%</td><td>falling</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">Here&#8217;s the point: we don&#8217;t have an SEO problem — on the contrary, we&#8217;ve had a remarkable success. Our Mobile rankings have exploded, bringing in 18,000 more impressions than the previous month.</p>



<p class="wp-block-paragraph">Mobile traffic, however, has historically had a structurally lower CTR than Desktop (more noise in the SERP, faster scrolling, distractions). That huge influx of low-CTR impressions &#8220;watered down&#8221; the global average, dragging it downward. The aggregate figure told us <em>&#8220;we&#8217;re getting worse&#8221;</em>; the disaggregated data tells us <em>&#8220;we&#8217;re improving across the board, but our traffic mix has changed&#8221;</em>.</p>



<p class="wp-block-paragraph">The mathematical reason is simple, and it&#8217;s worth keeping firmly in mind: <strong>the aggregate CTR is not the average of the segments&#8217; CTRs, but a <em>weighted</em> average of them</strong>, where the weights are each segment&#8217;s share of impressions. As a formula:</p>



\(
\text{CTR}_{\text{agg}} = \frac{\sum_i \text{clicks}_i}{\sum_i \text{impressions}_i} = \sum_i w_i \cdot \text{CTR}_i, \qquad w_i = \frac{\text{impressions}_i}{\sum_j \text{impressions}_j} \\
\)



<p class="wp-block-paragraph">where \(\text{CTR}_i\) is the CTR of segment <em>i</em> and \(w_i\) is its weight, that is, the fraction of impressions it owns. In Month 2 the weight of Mobile went from 1/6 to 2/3 of the total: even though every individual CTR rose, the average shifted toward the (low) value of the segment that had become dominant. It&#8217;s not the math that has gone crazy: it&#8217;s the <em>mix</em> that has changed.</p>



<p class="wp-block-paragraph">Let&#8217;s reconstruct the whole thing in R, so we can see the mechanism at work instead of taking it on faith:</p>



<pre class="wp-block-code"><code># Reconstruct the two months' data
df &lt;- data.frame(
  segment     = c("Desktop", "Mobile", "Desktop", "Mobile"),
  month       = c("Month 1",  "Month 1", "Month 2",  "Month 2"),
  impressions = c(10000,      2000,      10000,      20000),
  clicks      = c(500,        40,        550,        500)
)

# CTR of each segment
df$ctr &lt;- df$clicks / df$impressions

# Aggregate CTR per month: a WEIGHTED average over impressions,
# NOT the arithmetic mean of the CTRs
agg &lt;- aggregate(cbind(clicks, impressions) ~ month, data = df, FUN = sum)
agg$aggregate_ctr &lt;- agg$clicks / agg$impressions
print(agg)</code></pre>



<p class="wp-block-paragraph">As the output shows, the aggregate drops from 4.5% to 3.5% while both segments rise. N.B.: the arithmetic mean of Month 2&#8217;s two CTRs would be 4% (the simple average of 5.5% and 2.5%), quite different from the real 3.5%. The entire difference is in the weights.</p>



<h2 class="wp-block-heading">Two More SEO Scenarios Where the Paradox Strikes</h2>



<p class="wp-block-paragraph">CTR by device is the textbook example, but Simpson&#8217;s Paradox lurks just about everywhere in our dashboards.</p>



<h3 class="wp-block-heading">1. The Conversion Rate Collapse (Informational vs. Transactional Intent)</h3>



<p class="wp-block-paragraph">We&#8217;re working on an e-commerce site and the organic conversion rate goes from 3% to 1.5%. A disaster? Not necessarily. If we&#8217;ve just launched a corporate blog that has started ranking well for hundreds of informational, top-of-the-funnel keywords, we&#8217;ve brought thousands of users to the site who are far from the purchase stage (with a physiological CR close to 0.1%). The CR of our product pages may be stable or growing, but the sheer volume of blog traffic has distorted the aggregate average.</p>



<h3 class="wp-block-heading">2. Cannibalization or Ranking Expansion?</h3>



<p class="wp-block-paragraph">One of our long-standing product pages used to rank only for 5 exact transactional keywords: 100 impressions, 10 clicks, 10% CTR. We decide to optimize its content, and the next month Google rewards its semantics, ranking it for 80 new long-tail and related keywords. Now the page gets 5,000 impressions and 100 clicks: 2% CTR. If we look only at the page&#8217;s average CTR in Search Console, it seems our optimization destroyed it; if we look at the absolute clicks, we&#8217;ve multiplied them tenfold.</p>



<h2 class="wp-block-heading">How to Defend Yourself (Takeaways for the Analyst)</h2>



<p class="wp-block-paragraph">How do we survive Simpson&#8217;s Paradox when presenting data to a client or stakeholder? Four precautions.</p>



<ol class="wp-block-list">

<li><strong>Never trust the aggregate figure alone.</strong> When analyzing relative metrics (conversion rates, click rates, averages), the global total is often the least useful number of all.</li>


<li><strong>Segment until you find homogeneity.</strong> Always split the data along logical dimensions before drawing conclusions: by device (Desktop/Mobile), by query type (brand/non-brand), and by page type (blog/product).</li>


<li><strong>Look for the shift in weights.</strong> If a global rate collapses but the subgroups hold steady, ask: <em>&#8220;has the traffic mix changed?&#8221;</em>. Almost always, a low-performing segment has suddenly increased its volumes.</li>


<li><strong>Educate the client.</strong> In a report, don&#8217;t just show the CTR drop: show the disaggregated table. Explaining the mechanism doesn&#8217;t just save the monthly report — it positions us as analysts who reason about data rather than being at its mercy.</li>

</ol>



<p class="wp-block-paragraph">Data doesn&#8217;t lie, but aggregate data makes for an excellent magician. The most solid defense, however, isn&#8217;t statistical but experimental: when we get to decide <em>how</em> to assign traffic — randomizing users between two versions of a page — the mix stops being a variable beyond our control. That&#8217;s exactly what we do with a rigorously run <a href="https://www.gironi.it/blog/en/guide-to-statistical-tests-for-a-b-analysis/">A/B test</a>, the next step on our path: seeing how a controlled experiment neutralizes at the root the confounding variables that here we&#8217;ve merely unmasked.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading" id="further-reading">Further Reading</h3>



<p class="wp-block-paragraph">If we want to dig deeper into Simpson&#8217;s Paradox and the art of reading data without being fooled, <a href="https://www.amazon.it/dp/8806246623?tag=consulenzeinf-21&#038;ascsubtag=simpsons-paradox-in-seo-when-aggregate-data-can-lie" rel="nofollow sponsored noopener" target="_blank"><em>The Art of Statistics</em></a> by David Spiegelhalter is the right read: it devotes lucid pages to this very paradox — including the famous Berkeley admissions case — showing how an aggregate number can tell the exact opposite of what happened in the data.</p>



<p class="wp-block-paragraph">And to discover how many other ways there are to be fooled by numbers — biased samples, conveniently chosen averages, doctored charts — <a href="https://www.amazon.it/dp/0140213007?tag=consulenzeinf-21&#038;ascsubtag=simpsons-paradox-in-seo-when-aggregate-data-can-lie" rel="nofollow sponsored noopener" target="_blank"><em>How to Lie with Statistics</em></a> by Darrell Huff is the little classic of the genre: written in 1954, it can be read in an afternoon and works as a permanent vaccine.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Sampling and Sample Size: How Much Data Do You Really Need?</title>
		<link>https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/</link>
					<comments>https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Wed, 06 May 2026 14:46:09 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3559</guid>

					<description><![CDATA[In this article: How to Choose Who to Measure: Types of Sampling Sample Size: The Math Behind the Estimation Let&#8217;s Calculate It in R and Python From Estimation to A/B Testing Sampling Error vs Bias Try It Yourself In everyday life, as in web analytics, we often have to make decisions based on incomplete information. &#8230; <a href="https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/" class="more-link">Continue reading<span class="screen-reader-text"> "Sampling and Sample Size: How Much Data Do You Really Need?"</span></a>]]></description>
										<content:encoded><![CDATA[
<div style="background-color: #f8f9fa;padding: 20px;border-radius: 8px;margin-bottom: 30px;border-left: 4px solid #4a90e2">
<h3 style="margin-top: 0">In this article:</h3>
<ul style="margin-bottom: 0">
<li><a href="#tipi-campionamento">How to Choose Who to Measure: Types of Sampling</a></li>
<li><a href="#formula-dimensione">Sample Size: The Math Behind the Estimation</a></li>
<li><a href="#esempio-codice">Let&#8217;s Calculate It in R and Python</a></li>
<li><a href="#collegamento-ab-test">From Estimation to A/B Testing</a></li>
<li><a href="#bias-errore">Sampling Error vs Bias</a></li>
<li><a href="#prova-tu">Try It Yourself</a></li>
</ul>
</div>



<p class="wp-block-paragraph">In everyday life, as in web analytics, we often have to make decisions based on incomplete information. How much data do I need to understand if this modification to the landing page worked? Are a thousand visits enough? Are ten thousand too many?</p>



<span id="more-3559"></span>



<p class="wp-block-paragraph">We can almost never measure the entire population (for example, all future visitors to a site). We have to work on a <strong>sample</strong>. And here lies the delicate balance: a sample that is too small leads to wrong conclusions, while one that is unnecessarily large wastes time and resources. So the question becomes: <strong>how much data do we really need?</strong></p>



<h2 class="wp-block-heading" id="tipi-campionamento">How to Choose Who to Measure: Types of Sampling</h2>



<p class="wp-block-paragraph">Before figuring out <em>how much</em> data we need, we must understand <em>how</em> to collect it. The three main methods are:</p>



<ul class="wp-block-list">
<li><strong>Simple random sampling:</strong> Every user has exactly the same probability of being chosen. It&#8217;s the gold standard, what we try to achieve when we randomize users in an A/B test.</li>
<li><strong>Stratified sampling:</strong> We divide users into groups (e.g., Mobile and Desktop traffic) and randomly sample within each group, respecting the original proportions. It ensures that no important minority is ignored.</li>
<li><strong>Systematic sampling:</strong> We choose one user every <em>k</em> (e.g., one user every 10). Easy to implement, but tricky when the data hide a cyclicity (imagine sampling one user every 7: if we end up with only Mondays, the estimate will be skewed from the start).</li>
</ul>



<h2 class="wp-block-heading" id="formula-dimensione">Sample Size: The Math Behind the Estimation</h2>



<p class="wp-block-paragraph">The intuition is straightforward: the smaller the effect we are looking for (or the more variable the data), the more observations we need to distinguish it from background noise. Sounds hard to formalise? It is more linear than it seems.</p>



<p class="wp-block-paragraph">To calculate the exact number, we need three ingredients:</p>



<ul class="wp-block-list">
<li><strong>Confidence level:</strong> How sure do we want to be? We usually use 95% (which corresponds to a Z-score of 1.96).</li>
<li><strong>Margin of error (E):</strong> The maximum error we are willing to accept (e.g., 1% or 0.01).</li>
<li><strong>Expected proportion (p):</strong> The estimated conversion rate. If we have no idea, we use 0.5 (50%): it represents maximum uncertainty and yields the largest possible sample, so it is the most conservative choice.</li>
</ul>



<p class="wp-block-paragraph">The formula to estimate a proportion (like the Conversion Rate) is:</p>



<p class="wp-block-paragraph" style="text-align: center;font-size: 1.2em"><strong>n = (Z&sup2; &times; p(1 &#8211; p)) / E&sup2;</strong></p>



<h2 class="wp-block-heading" id="esempio-codice">Let&#8217;s Calculate It in R and Python</h2>



<p class="wp-block-paragraph">Let&#8217;s run a quick example. We want to estimate the Conversion Rate of a new page with a margin of error of 1% (0.01) and a confidence level of 95% (Z = 1.96). To stay on the safe side, we set p = 0.5.</p>



<p class="wp-block-paragraph">The examples below are in both R and Python — pick whichever language feels more familiar.</p>



<p class="wp-block-paragraph">Let&#8217;s calculate it in R:</p>



<pre><code class="language-r"># Sample size calculation for a proportion
Z &lt;- 1.96
p &lt;- 0.5
E &lt;- 0.01

n &lt;- (Z^2 * p * (1-p)) / E^2
print(paste("Required size:", round(n)))
# Output: Required size: 9604</code></pre>



<p class="wp-block-paragraph">Let&#8217;s verify it in Python:</p>



<pre><code class="language-python"># Sample size calculation for a proportion
Z = 1.96
p = 0.5
E = 0.01

n = (Z**2 * p * (1-p)) / E**2
print(f"Required size: {round(n)}")
# Output: Required size: 9604</code></pre>



<p class="wp-block-paragraph">As we can see, around 9,604 users are needed to reach that precision. N.B.: if we accepted a margin of error of 2% (E=0.02), the number would collapse to about 2,401. That is the effect of <em>E</em> squared in the denominator: halving the precision requirement divides the required sample by four. Worth keeping in mind whenever we decide which margin to accept.</p>



<h2 class="wp-block-heading" id="collegamento-ab-test">From Estimation to A/B Testing</h2>



<p class="wp-block-paragraph">The formula seen so far estimates a single proportion. But in everyday CRO (Conversion Rate Optimization) work the actual problem is almost always a different one: <em>comparing</em> two proportions, as in an A/B test.</p>



<p class="wp-block-paragraph">In that case the logic is the same, but the formula gets more complex because two new concepts come into play: the <strong>Effect Size</strong> (the minimum difference we want to detect) and the <strong>Statistical Power</strong>.</p>



<p class="wp-block-paragraph">To skip the manual calculation, I built an <a href="/blog/en/ab-test-sample-size-calculator/">interactive A/B test sample size calculator</a>: it does the dirty work and also indicates how many days the test should run, given the page&#8217;s average traffic.</p>



<h2 class="wp-block-heading" id="bias-errore">Sampling Error vs Bias</h2>



<p class="wp-block-paragraph">One point worth keeping firmly in mind before closing. Sampling error (the one the formula handles) is inevitable and shrinks as the data grow. But there is a far more insidious enemy, and no formula captures it: <strong>bias</strong>.</p>



<p class="wp-block-paragraph">If we test a page only during the weekend, we might collect a million visits (sampling error practically zero), but the sample will not be representative of weekday users. So: no formula can save a sample that is biased at the source. A thousand observations gathered well beat a million gathered badly.</p>



<h2 class="wp-block-heading" id="prova-tu">Try It Yourself</h2>



<p class="wp-block-paragraph">A product page receives roughly 10,000 impressions per month on Google, with an observed CTR of 3.5%. We want to estimate the true CTR with a margin of error of 1 percentage point (E = 0.01) and 95% confidence.</p>



<ol class="wp-block-list">
<li>Compute the required sample size with the formula above, first using p = 0.5 (conservative case) and then p = 0.035 (observed CTR).</li>
<li>Compare the two results: how much does the data requirement change once we have a reasonable estimate of p?</li>
<li>Given 10,000 impressions per month, how many months are needed to satisfy the conservative estimate?</li>
<li>If we accepted a 2% margin (E = 0.02), how would the collection time change?</li>
</ol>



<p class="wp-block-paragraph">Hint: in R, a minimal function is enough — <code>sample_size &lt;- function(Z, p, E) ceiling((Z^2 * p * (1-p)) / E^2)</code> — to be called twice with the two values of <em>p</em>.</p>



<hr class="wp-block-separator has-alpha-channel-opacity" />



<p class="wp-block-paragraph">Now we know how to collect an adequate sample and how much data we need. One question remains: how do we use that sample to rigorously compare two versions of the same page? This is where <a href="/blog/en/guide-to-statistical-tests-for-a-b-analysis/">actual A/B testing</a> comes in, and it is the next step of the path.</p>



<hr class="wp-block-separator has-alpha-channel-opacity" />



<h3 class="wp-block-heading" id="further-reading">Further Reading</h3>



<p class="wp-block-paragraph">To dig deeper into sampling, the biases that can distort it, and the logic of statistical inference, <a href="https://www.amazon.it/dp/8806246623?tag=consulenzeinf-21&#038;ascsubtag=sampling-and-sample-size-how-much-data-do-you-really-need" rel="nofollow sponsored noopener" target="_blank"><em>The Art of Statistics</em></a> by David Spiegelhalter is the most suitable companion. Spiegelhalter devotes illuminating pages to real cases — flawed polls, convenience samples, misleading figures — showing how the mathematics of sampling means little without careful thought on <em>how</em> the data are collected.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>The Monte Carlo Method Explained Simply with Real-World Applications</title>
		<link>https://www.gironi.it/blog/en/the-monte-carlo-method-explained-simply-with-real-world-applications/</link>
					<comments>https://www.gironi.it/blog/en/the-monte-carlo-method-explained-simply-with-real-world-applications/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Wed, 11 Mar 2026 14:49:05 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3512</guid>

					<description><![CDATA[What is the Monte Carlo method The story of the Monte Carlo method begins in the most unlikely way: with a mathematician in bed playing cards. In 1946, Stanisław Ulam, a Polish mathematician recovering from surgery, found himself playing solitaire to pass the time. Being a mathematician, he wondered: what are the chances of winning &#8230; <a href="https://www.gironi.it/blog/en/the-monte-carlo-method-explained-simply-with-real-world-applications/" class="more-link">Continue reading<span class="screen-reader-text"> "The Monte Carlo Method Explained Simply with Real-World Applications"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph"><!--
  The Monte Carlo Method - Enriched blog content
  gironi.it/blog/en/the-monte-carlo-method/ (EN version)

  Instructions:
  1. Publish EN post via publish_montecarlo_en.py
  2. Add iframe to simulator manually in Gutenberg (Custom HTML block)
  3. Upload test_en.html via FTP to /utility/montecarlo-simulator-en/index.html
--></p>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 1: WHAT IS THE MONTE CARLO METHOD (~500 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">What is the Monte Carlo method</h2>



<p class="wp-block-paragraph">The story of the Monte Carlo method begins in the most unlikely way: with a mathematician in bed playing cards. In 1946, <strong>Stanisław Ulam</strong>, a Polish mathematician recovering from surgery, found himself playing solitaire to pass the time. Being a mathematician, he wondered: what are the chances of winning a game?</p>



<p class="wp-block-paragraph">The problem was theoretically solvable: just enumerate every possible combination of cards and count the favorable ones. In practice, however, the number of combinations was so enormous that an exact calculation was completely impractical. Ulam then had an insight as simple as it was powerful: <strong>instead of computing the exact probability, why not simulate hundreds of games and count how many times you win?</strong></p>



<span id="more-3512"></span>



<p class="wp-block-paragraph">The idea is disarmingly simple. If we play 1,000 games and win 230 of them, we can estimate the probability of winning at about 23%. The more games we simulate, the closer our estimate gets to the true value. This is, in essence, the <strong>Monte Carlo method</strong>: using random simulation to solve problems that would be too complex to tackle analytically.</p>



<p class="wp-block-paragraph">Ulam shared the idea with his colleague <strong>John von Neumann</strong>, arguably the most brilliant mathematician of the 20th century, who immediately saw its potential. Von Neumann realized that <strong>ENIAC</strong> — one of the very first electronic computers, which filled an entire room — could run thousands of simulations in reasonable time. Together, they developed the method for a problem far more serious than solitaire: the <strong>diffusion of neutrons</strong> in atomic weapons, as part of the Manhattan Project at Los Alamos.</p>



<p class="wp-block-paragraph">The name “Monte Carlo” was chosen as a code name, a reference to the famous <strong>Monte Carlo Casino</strong> in Monaco. Legend has it that the inspiration came from Ulam’s uncle, a notorious gambler. After all, the heart of the method is chance itself: generating random numbers to explore spaces of possibility too vast to traverse systematically.</p>



<p class="wp-block-paragraph">From those early nuclear experiments of the 1940s, the Monte Carlo method has spread to every field of science and engineering. Today it is one of the most widely used computational tools in the world, from particle physics to finance, from cinematic rendering to drug discovery. Let’s see how it works.</p>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 2: FUNDAMENTAL CONCEPTS (~300 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">Fundamental concepts</h2>



<p class="wp-block-paragraph">The Monte Carlo method rests on a statistical principle we’ve encountered before: the <strong>law of large numbers</strong>. In simple terms, this law tells us that the average of a random sample approaches the population average as the sample grows. Translated into Monte Carlo language: <strong>the more simulations we run, the more accurate our result will be</strong>.</p>



<p class="wp-block-paragraph">To run a Monte Carlo simulation, we need <strong>random numbers</strong>. In practice, computers don’t generate truly random numbers: they use deterministic algorithms that produce sequences of <strong>pseudo-random numbers</strong> with statistical properties indistinguishable from real randomness. In R, for example, the <code>runif()</code> function generates uniformly distributed numbers between 0 and 1.</p>



<p class="wp-block-paragraph">A crucial aspect is the <strong>rate of convergence</strong>. The Monte Carlo estimation error decreases as <strong>1/√n</strong>, where n is the number of simulations. This means that to halve the error, we need to quadruple our simulations; to gain one more decimal digit of precision, we need 100 times more iterations. It’s not particularly efficient, but the beauty of the method lies in the fact that <strong>it works regardless of the problem’s complexity</strong>: whether the problem has 2 or 2,000 variables, the convergence rate remains the same.</p>



<p class="wp-block-paragraph">In practice, we must always balance <strong>desired precision</strong> with <strong>available computational resources</strong>. Increasing the number of simulations comes at a cost in computation time. Fortunately, modern computers make this trade-off much more favorable than in the days of ENIAC.</p>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 3: THE METHOD IN ACTION (~400 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">The Monte Carlo method in action</h2>



<p class="wp-block-paragraph">Let’s see concretely how the Monte Carlo method is applied. The procedure follows four fundamental steps:</p>



<p class="wp-block-paragraph"><strong>1. Define the model.</strong> First, we identify the problem’s variables and the probability distributions that govern them. For instance, if we want to simulate an investment’s return, the model will include the expected return (mean) and volatility (standard deviation), typically assuming normally distributed returns.</p>



<p class="wp-block-paragraph"><strong>2. Generate random scenarios.</strong> Using a pseudo-random number generator, we produce thousands of possible scenarios. Each scenario represents an “alternative history”: one way things could play out.</p>



<p class="wp-block-paragraph"><strong>3. Compute the result for each scenario.</strong> For each scenario, we apply the model and obtain a result. If we’re simulating an investment, the result is the final portfolio value.</p>



<p class="wp-block-paragraph"><strong>4. Aggregate the results.</strong> Finally, we analyze the set of results: we compute the mean, the median, the percentiles. This gives us not just an estimate of the expected outcome, but an entire <strong>distribution of possibilities</strong>. And this is where Monte Carlo truly shines: it tells us not only “how much we’re likely to earn” but also “how much we could lose in the worst case.”</p>



<p class="wp-block-paragraph">Let’s use a quick example to illustrate convergence. Imagine flipping a coin and trying to estimate the probability of heads. After 10 flips, we might get 7 heads (70%), an estimate far from the true 50%. After 100 flips, we’ll be closer, perhaps 53%. After 10,000 flips, our estimate will be very close to 50%. This is Monte Carlo in action: replacing a theoretical calculation with an experiment repeated thousands of times.</p>



<p class="wp-block-paragraph">The power of the method lies in its <strong>flexibility</strong>. While analytical methods require closed-form solutions (which often don’t exist for complex problems), Monte Carlo only requires the ability to simulate the process. If we can write a program that generates one scenario, Monte Carlo gives us the distribution of outcomes.</p>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 4: PRACTICAL EXAMPLES (~600 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">Practical examples: estimating π and portfolio returns</h2>



<h3 class="wp-block-heading">Example 1: estimating the value of π</h3>



<p class="wp-block-paragraph">The most classic and pedagogically effective example of the Monte Carlo method is <strong>estimating the number π</strong>. The idea is elegant: consider a square of side 2 with a circle of radius 1 inscribed inside it. The area of the square is 4, the area of the circle is π. If we generate random points inside the square, the proportion falling inside the circle will be approximately π/4.</p>



<p class="wp-block-paragraph">We compute this in R with 100,000 points:</p>



<pre class="wp-block-code"><code>set.seed(123)
n &lt;- 100000
x &lt;- runif(n, -1, 1)
y &lt;- runif(n, -1, 1)
inside &lt;- (x^2 + y^2) &lt;= 1
pi_estimate &lt;- 4 * sum(inside) / n
pi_estimate
# &#91;1] 3.13956</code></pre>



<p class="wp-block-paragraph">The same in Python:</p>



<pre class="wp-block-code"><code>import random
random.seed(123)
n = 100000
inside = sum(1 for _ in range(n)
             if random.uniform(-1, 1)**2 + random.uniform(-1, 1)**2 &lt;= 1)
pi_estimate = 4 * inside / n
print(pi_estimate)
# 3.14268</code></pre>



<p class="wp-block-paragraph">With 100,000 points we already get a reasonable estimate, though not extremely precise: we’re accurate to about two decimal places. As we mentioned, gaining another digit of precision would require roughly 100 times more points. The computer does all the heavy lifting.</p>



<h3 class="wp-block-heading">Example 2: estimating portfolio returns</h3>



<p class="wp-block-paragraph">Let’s move to an example closer to real-world applications. Suppose we have a portfolio of three stocks with the following characteristics:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Stock</th><th>Expected Return</th><th>Standard Deviation</th><th>Portfolio Weight</th></tr></thead><tbody><tr><td>A</td><td>8%</td><td>12%</td><td>40%</td></tr><tr><td>B</td><td>10%</td><td>15%</td><td>30%</td></tr><tr><td>C</td><td>12%</td><td>18%</td><td>30%</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">We want to estimate the probability that the portfolio return exceeds 10%. We simulate in R with 10,000 scenarios:</p>



<pre class="wp-block-code"><code>set.seed(42)
sim_A &lt;- rnorm(10000, mean = 0.08, sd = 0.12)
sim_B &lt;- rnorm(10000, mean = 0.10, sd = 0.15)
sim_C &lt;- rnorm(10000, mean = 0.12, sd = 0.18)
sim_portfolio &lt;- 0.4 * sim_A + 0.3 * sim_B + 0.3 * sim_C
prob_result &lt;- mean(sim_portfolio &gt;= 0.10)
prob_result
# &#91;1] 0.4504</code></pre>



<p class="wp-block-paragraph">The same in Python:</p>



<pre class="wp-block-code"><code>import random
random.seed(42)
n = 10000
count = 0
for _ in range(n):
    a = random.gauss(0.08, 0.12)
    b = random.gauss(0.10, 0.15)
    c = random.gauss(0.12, 0.18)
    ptf = 0.4 * a + 0.3 * b + 0.3 * c
    if ptf &gt;= 0.10:
        count += 1
print(count / n)
# 0.4479</code></pre>



<p class="wp-block-paragraph">The result tells us there’s roughly a 45% chance of exceeding 10% return. Notice how Monte Carlo gives us not a single number, but an entire distribution: we could easily compute the median return, the worst-case 5th percentile, the probability of loss, and so on.</p>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 5: INTERACTIVE SIMULATOR (~200 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">Monte Carlo Simulator</h2>



<p class="wp-block-paragraph">To make the concept even more tangible, we’ve built an <strong>interactive simulator</strong> that applies the Monte Carlo method to predict the future value of an investment. The underlying model is the <strong>Geometric Brownian Motion</strong> (GBM), the same model used in the famous Black-Scholes framework for options pricing.</p>



<p class="wp-block-paragraph">Intuitively, an asset’s future price is computed as the current price multiplied by a random growth factor. The formula is:</p>



<p class="has-text-align-center wp-block-paragraph"><strong>S(t+1) = S(t) × exp((μ − σ²/2) + σ × Z)</strong></p>



<p class="wp-block-paragraph">where <strong>μ</strong> is the expected annual return (the “average growth”), <strong>σ</strong> is the volatility (how much the price fluctuates — our measure of uncertainty), and <strong>Z</strong> is a random number drawn from a normal distribution. Each simulation generates a different path: some scenarios see the portfolio grow substantially, others see it decline. The histogram shows the distribution of all possible outcomes.</p>



<iframe src="https://www.gironi.it/utility/montecarlo-simulator-en/" width="100%" height="600" style="border:none;border-radius:12px;" loading="lazy" title="Monte Carlo Simulator"></iframe>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 6: MODERN APPLICATIONS (~400 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">Modern applications of the Monte Carlo method</h2>



<p class="wp-block-paragraph">From the nuclear physics of the 1940s, the Monte Carlo method has spread to domains that Ulam and von Neumann could never have imagined. Let’s look at some of the most fascinating applications.</p>



<p class="wp-block-paragraph"><strong>3D rendering and cinema.</strong> Every time we watch a Pixar film or a blockbuster with visual effects, we’re admiring Monte Carlo at work. The technique is called <strong>path tracing</strong>: to compute the color of each pixel, the software simulates millions of light rays bouncing between surfaces in the scene. Each ray follows a random path, and the average of thousands of paths produces the photorealistic image we see on screen.</p>



<p class="wp-block-paragraph"><strong>Finance and risk management.</strong> In the financial world, Monte Carlo is ubiquitous. Banks use it to calculate <strong>Value at Risk</strong> (VaR) — the maximum probable loss of a portfolio over a given time horizon. It’s the same principle as our simulator, applied to portfolios with hundreds of assets and complex correlations. Pricing exotic options that lack closed-form solutions also relies on Monte Carlo simulations.</p>



<p class="wp-block-paragraph"><strong>Drug discovery.</strong> In pharmaceutical research, Monte Carlo is used to simulate <strong>molecular docking</strong>: how a candidate molecule binds to a target protein. By simulating millions of possible spatial configurations, researchers identify the most promising compounds before synthesizing them in the lab, saving years of experimentation.</p>



<p class="wp-block-paragraph"><strong>Climate models.</strong> Models predicting climate change are inherently uncertain: they depend on emission scenarios, atmospheric feedback, ocean dynamics. Monte Carlo allows exploration of thousands of parameter combinations and generates the <strong>uncertainty bands</strong> we see in IPCC reports. Not a single prediction, but a distribution of possible futures.</p>



<p class="wp-block-paragraph"><strong>Artificial intelligence.</strong> In machine learning, a technique called <strong>Monte Carlo dropout</strong> uses simulation to estimate the uncertainty of a neural network’s predictions. And the famous <strong>AlphaGo</strong> by DeepMind, which in 2016 defeated the world Go champion, used <strong>Monte Carlo Tree Search</strong> (MCTS) to explore possible moves in a game with more configurations than atoms in the universe.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Field</th><th>Example</th><th>What is simulated</th></tr></thead><tbody><tr><td>Cinema/3D</td><td>Path tracing (Pixar)</td><td>Light ray paths</td></tr><tr><td>Finance</td><td>Value at Risk</td><td>Market scenarios</td></tr><tr><td>Pharmaceuticals</td><td>Molecular docking</td><td>Spatial configurations</td></tr><tr><td>Climate</td><td>IPCC models</td><td>Parameter combinations</td></tr><tr><td>AI</td><td>AlphaGo (MCTS)</td><td>Possible moves</td></tr></tbody></table></figure>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- SECTION 7: ADVANTAGES AND LIMITATIONS (~300 words) --><br><!-- ============================================================ --></p>



<h2 class="wp-block-heading">Advantages and limitations of the Monte Carlo method</h2>



<p class="wp-block-paragraph">Like any statistical tool, the Monte Carlo method has its strengths and limitations. Let’s examine them honestly.</p>



<p class="wp-block-paragraph"><strong>Flexibility.</strong> The greatest advantage is versatility: Monte Carlo applies to complex problems of any size and in any field, from finance to engineering, physics to biology. It doesn’t require closed-form solutions, only the ability to simulate the process.</p>



<p class="wp-block-paragraph"><strong>Accuracy.</strong> With a sufficient number of simulations, the estimate can be made arbitrarily precise. The more we run the method, the closer the result converges to the true value.</p>



<p class="wp-block-paragraph"><strong>Scalability.</strong> Unlike grid-based methods, which suffer from the “curse of dimensionality” (cost explodes with the number of variables), Monte Carlo maintains the same convergence rate regardless of the number of dimensions. This makes it the only practical tool for high-dimensional problems.</p>



<p class="wp-block-paragraph">However, the method also presents <strong>significant limitations</strong>:</p>



<p class="wp-block-paragraph"><strong>Slow convergence.</strong> The 1/√n rate means that gaining one digit of precision requires 100 times more simulations. For problems demanding very high precision, this can be prohibitive.</p>



<p class="wp-block-paragraph"><strong>Computational cost.</strong> For complex problems (many variables, heavy models), each individual simulation may require significant time. Multiplied by thousands or millions of iterations, the cost becomes considerable.</p>



<p class="wp-block-paragraph">To mitigate these limitations, <strong>variance reduction techniques</strong> have been developed over the years, enabling more precise results with fewer simulations:</p>



<ul class="wp-block-list">
<li><strong>Importance sampling</strong>: sampling from an alternative distribution that “concentrates” simulations in the most informative regions.</li>



<li><strong>Control variates</strong>: using a correlated variable with known expected value to reduce the estimate’s variance.</li>



<li><strong>Stratified sampling</strong>: dividing the space into homogeneous subgroups and sampling from each.</li>



<li><strong>Antithetic variates</strong>: exploiting pairs of negatively correlated random numbers to reduce variance.</li>
</ul>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- CLOSING --><br><!-- ============================================================ --></p>



<p class="wp-block-paragraph">The Monte Carlo method represents one of the most powerful tools in computational statistics. In future articles, we’ll explore how some of these techniques — particularly the <strong>bootstrap</strong>, a close relative of Monte Carlo — apply to concrete problems in statistical inference.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="wp-block-paragraph"><!-- ============================================================ --><br><!-- FURTHER READING --><br><!-- ============================================================ --></p>



<h3 class="wp-block-heading">Further reading</h3>



<p class="wp-block-paragraph">For a deeper dive into the Monte Carlo method and its applications in finance, <a href="https://www.amazon.it/dp/1441915753?tag=consulenzeinf-21&#038;ascsubtag=the-monte-carlo-method-explained-simply-with-real-world-applications" target="_blank" rel="nofollow noopener sponsored"><em>Monte Carlo Methods in Financial Engineering</em></a> by Paul Glasserman is the most comprehensive reference: it covers theory and practice with detailed examples in derivative pricing and risk management.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/the-monte-carlo-method-explained-simply-with-real-world-applications/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>The French Revolutionary Calendar</title>
		<link>https://www.gironi.it/blog/en/the-french-revolutionary-calendar/</link>
					<comments>https://www.gironi.it/blog/en/the-french-revolutionary-calendar/#respond</comments>
		
		<dc:creator><![CDATA[Paolo Gironi]]></dc:creator>
		<pubDate>Sun, 08 Mar 2026 17:23:02 +0000</pubDate>
				<category><![CDATA[history]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/the-french-revolutionary-calendar/</guid>

					<description><![CDATA[Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary calendar date, and vice versa. The algorithm uses historically verified equinox dates for years I-XIV (1792-1805) and the Romme method for later dates. History of the Revolutionary Calendar The French Revolutionary calendar, also known as the Republican calendar, was one &#8230; <a href="https://www.gironi.it/blog/en/the-french-revolutionary-calendar/" class="more-link">Continue reading<span class="screen-reader-text"> "The French Revolutionary Calendar"</span></a>]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Date Converter</h2>



<p class="wp-block-paragraph">Use the converter to transform any Gregorian date into the corresponding French Revolutionary calendar date, and vice versa. The algorithm uses historically verified equinox dates for years I-XIV (1792-1805) and the Romme method for later dates.</p>



<span id="more-3508"></span>



<iframe src="https://www.gironi.it/utility/calendario-rivoluzionario-en/" width="100%" height="500" frameborder="0" style="border:none; border-radius:8px;"></iframe>



<h2 class="wp-block-heading">History of the Revolutionary Calendar</h2>



<p class="wp-block-paragraph">The French Revolutionary calendar, also known as the <strong>Republican calendar</strong>, was one of the most ambitious projects of the French Revolution: redesigning time itself. It was not a simple renaming exercise, but a radical restructuring of how the French measured days, weeks, and years.</p>


<div class="wp-block-image">
<figure class="aligncenter"><img decoding="async" width="260" height="191" src="https://www.gironi.it/blog/wp-content/uploads/2017/09/calendario-rivoluzione.jpg" alt="French Revolutionary Calendar" class="wp-image-518"/></figure>
</div>


<p class="wp-block-paragraph"></p>



<h3 class="wp-block-heading">Why a New Calendar?</h3>



<p class="wp-block-paragraph">The revolutionaries saw the Gregorian calendar as a symbol of the Ancien Regime and the power of the Catholic Church. Every day was dedicated to a saint, holidays marked the liturgical year, and the seven-day week had biblical origins. To build a society based on reason and nature, a new calendar was needed.</p>



<p class="wp-block-paragraph">The idea was part of a broader <strong>decimalization</strong> project that included the metric system (still in use) and decimal time (abandoned). The calendar was meant to reflect nature, seasons, and agricultural work instead of saints and religious holidays.</p>



<h3 class="wp-block-heading">Who Created It</h3>



<p class="wp-block-paragraph">The calendar was the result of the work of two key figures:</p>



<ul class="wp-block-list">
<li><strong>Gilbert Romme</strong> (1750-1795), mathematician and deputy, designed the mathematical structure: 12 months of 30 days, decades of 10 days, complementary days.</li>



<li><strong>Fabre d&#8217;Eglantine</strong> (1750-1794), poet and playwright, devised the evocative month names and created the rural calendar, assigning each day a name connected to nature.</li>
</ul>



<p class="wp-block-paragraph">The decree was approved on 24 October 1793 (3 Brumaire An II) by the National Convention.</p>



<h3 class="wp-block-heading">Period of Use</h3>



<p class="wp-block-paragraph">The calendar was in effect from <strong>22 September 1792</strong> to <strong>1 January 1806</strong>, when Napoleon abolished it. It was briefly revived during the <strong>Paris Commune</strong> in 1871, for just 18 days.</p>



<h2 class="wp-block-heading">How the Revolutionary Calendar Works</h2>



<h3 class="wp-block-heading">The 12 Months and Their Meaning</h3>



<p class="wp-block-paragraph">Fabre d&#8217;Eglantine chose names that evoked the weather and agricultural conditions of each period. The months are grouped into four seasons, recognizable by their suffix:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Season</th><th>Month</th><th>Meaning</th><th>Approx. Gregorian Period</th></tr></thead><tbody><tr><td rowspan="3"><strong>Autumn</strong> (-aire)</td><td>Vendemiaire</td><td>Grape harvest</td><td>22 Sep &#8211; 21 Oct</td></tr><tr><td>Brumaire</td><td>Fog, mist</td><td>22 Oct &#8211; 20 Nov</td></tr><tr><td>Frimaire</td><td>Frost</td><td>21 Nov &#8211; 20 Dec</td></tr><tr><td rowspan="3"><strong>Winter</strong> (-ose)</td><td>Nivose</td><td>Snow</td><td>21 Dec &#8211; 19 Jan</td></tr><tr><td>Pluviose</td><td>Rain</td><td>20 Jan &#8211; 18 Feb</td></tr><tr><td>Ventose</td><td>Wind</td><td>19 Feb &#8211; 20 Mar</td></tr><tr><td rowspan="3"><strong>Spring</strong> (-al)</td><td>Germinal</td><td>Germination</td><td>21 Mar &#8211; 19 Apr</td></tr><tr><td>Floreal</td><td>Flower</td><td>20 Apr &#8211; 19 May</td></tr><tr><td>Prairial</td><td>Meadow</td><td>20 May &#8211; 18 Jun</td></tr><tr><td rowspan="3"><strong>Summer</strong> (-idor)</td><td>Messidor</td><td>Harvest</td><td>19 Jun &#8211; 18 Jul</td></tr><tr><td>Thermidor</td><td>Heat</td><td>19 Jul &#8211; 17 Aug</td></tr><tr><td>Fructidor</td><td>Fruit</td><td>18 Aug &#8211; 16 Sep</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">The suffixes make it easy to identify the season: <strong>-aire</strong> for autumn, <strong>-ose</strong> for winter, <strong>-al</strong> for spring, <strong>-idor</strong> for summer.</p>



<h3 class="wp-block-heading">The Decade: 10-Day Weeks</h3>



<p class="wp-block-paragraph">The week was replaced by the <strong>decade</strong>, a 10-day period. The days were called: Primidi, Duodi, Tridi, Quartidi, Quintidi, Sextidi, Septidi, Octidi, Nonidi, Decadi (rest day).</p>



<p class="wp-block-paragraph">The decade was one of the most unpopular aspects of the calendar: workers went from one rest day every 7 to one every 10.</p>



<h3 class="wp-block-heading">The Sansculottides: Complementary Days</h3>



<p class="wp-block-paragraph">With 12 months of 30 days we reach 360, leaving 5 days short (6 in leap years). These days, called <strong>Sansculottides</strong>, fell at the end of the year and were dedicated to republican values: Virtue, Genius, Labour, Opinion, Rewards, and Revolution (leap years only).</p>



<h3 class="wp-block-heading">Leap Years and the Autumnal Equinox</h3>



<p class="wp-block-paragraph">The republican year began on the day of the <strong>autumnal equinox</strong>. During the historical period (years I-XIV), leap years were <strong>III, VII, and XI</strong>. For later dates, the <strong>Romme method</strong> is used: divisible by 4, not by 100, unless by 400.</p>



<h2 class="wp-block-heading">Famous Dates in the Revolutionary Calendar</h2>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Republican Date</th><th>Gregorian Date</th><th>Event</th></tr></thead><tbody><tr><td>1 Vendemiaire An I</td><td>22 September 1792</td><td>Proclamation of the Republic</td></tr><tr><td>2 Pluviose An I</td><td>21 January 1793</td><td>Execution of Louis XVI</td></tr><tr><td>12 Germinal An II</td><td>1 April 1794</td><td>Abolition of slavery in French colonies</td></tr><tr><td><strong>9 Thermidor An II</strong></td><td>27 July 1794</td><td>Fall of Robespierre, end of the Terror</td></tr><tr><td>13 Vendemiaire An IV</td><td>5 October 1795</td><td>Napoleon suppresses the royalist uprising</td></tr><tr><td>18 Fructidor An V</td><td>4 September 1797</td><td>Directory coup against monarchists</td></tr><tr><td><strong>18 Brumaire An VIII</strong></td><td>9 November 1799</td><td>Napoleon&#8217;s coup d&#8217;etat</td></tr><tr><td>11 Frimaire An XIII</td><td>2 December 1804</td><td>Napoleon&#8217;s coronation as Emperor</td></tr><tr><td>10 Nivose An XIV</td><td>31 December 1805</td><td>Last day of the calendar</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">The expressions <strong>&#8220;9 Thermidor&#8221;</strong> and <strong>&#8220;18 Brumaire&#8221;</strong> have become universal synonyms for &#8220;counter-revolutionary reaction&#8221; and &#8220;military coup d&#8217;etat&#8221; respectively. Karl Marx wrote in <em>The Eighteenth Brumaire of Louis Bonaparte</em> (1852) the famous phrase: &#8220;history repeats itself, first as tragedy, then as farce.&#8221;</p>



<h2 class="wp-block-heading">The Rural Calendar: A Name for Every Day</h2>



<p class="wp-block-paragraph">Every day of the year bore the name of a natural element: Quintidi days were named after an animal, Decadi days after an agricultural tool, and all other days after plants, flowers, or minerals. Examples: 1 Vendemiaire = Grape, 5 Floreal = Nightingale, 27 Messidor = Strawberry.</p>



<h2 class="wp-block-heading">Trivia and Pop Culture</h2>



<ul class="wp-block-list">
<li><strong>Lobster Thermidor</strong>: the famous dish is named after the summer month</li>



<li><strong>Germinal by Zola</strong>: the 1885 novel takes its title from the spring month</li>



<li><strong>The Paris Commune</strong> (1871) briefly restored the calendar</li>



<li>The <strong>metric system</strong> is the only decimal reform of the Revolution that survived</li>
</ul>



<h2 class="wp-block-heading">Frequently Asked Questions</h2>



<div>
<div>
<h3>When does the year begin in the Revolutionary calendar?</h3>
<div>
<p>The republican year begins on the day of the autumnal equinox, usually 22 or 23 September. Year I began on 22 September 1792.</p>
</div>
</div>
<div>
<h3>What is 18 Brumaire?</h3>
<div>
<p>18 Brumaire An VIII (9 November 1799) is the date of Napoleon&#8217;s coup d&#8217;etat. The expression has become synonymous with military coup.</p>
</div>
</div>
<div>
<h3>How long was the Revolutionary calendar in use?</h3>
<div>
<p>It was in effect for about 13 years, from 22 September 1792 to 1 January 1806. It was briefly revived during the Paris Commune in 1871.</p>
</div>
</div>
<div>
<h3>What are the Sansculottides?</h3>
<div>
<p>They are the 5 or 6 complementary days at the end of the year, dedicated to republican values: Virtue, Genius, Labour, Opinion, Rewards, and Revolution (leap years only).</p>
</div>
</div>
</div>



<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [
      {
        "@type": "Question",
        "name": "When does the year begin in the Revolutionary calendar?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "The republican year begins on the day of the autumnal equinox, usually 22 or 23 September. Year I began on 22 September 1792."
        }
      },
      {
        "@type": "Question",
        "name": "What is 18 Brumaire?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "18 Brumaire An VIII (9 November 1799) is the date of Napoleon's coup d'etat. The expression has become synonymous with military coup."
        }
      },
      {
        "@type": "Question",
        "name": "How long was the Revolutionary calendar in use?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "It was in effect for about 13 years, from 22 September 1792 to 1 January 1806. It was briefly revived during the Paris Commune in 1871."
        }
      },
      {
        "@type": "Question",
        "name": "What are the Sansculottides?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "They are the 5 or 6 complementary days at the end of the year, dedicated to republican values: Virtue, Genius, Labour, Opinion, Rewards, and Revolution (leap years only)."
        }
      }
    ]
  }
  </script>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/the-french-revolutionary-calendar/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
