  {"id":3579,"date":"2026-05-27T13:47:27","date_gmt":"2026-05-27T12:47:27","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3579"},"modified":"2026-05-27T13:49:12","modified_gmt":"2026-05-27T12:49:12","slug":"simpsons-paradox-in-seo-when-aggregate-data-can-lie","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/simpsons-paradox-in-seo-when-aggregate-data-can-lie\/","title":{"rendered":"Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">It&#8217;s the last day of the month. We&#8217;re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: <strong>the site&#8217;s overall organic CTR has collapsed from 4.5% to 3.5%<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Before writing the bad-news email and bracing ourselves to justify the drop, let&#8217;s do the right thing: disaggregate the data to understand <strong>where<\/strong> we&#8217;re losing ground. We look at performance by device and discover something seemingly impossible:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CTR on <strong>Desktop<\/strong> rose from 5.0% to 5.5%.<\/li>\n\n\n\n<li>CTR on <strong>Mobile<\/strong> rose from 2.0% to 2.5%.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We stare at the screen. How is it mathematically possible that performance improved everywhere, yet the overall total dropped by a full percentage point?<\/p>\n\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">We haven&#8217;t broken Google Search Console, and we haven&#8217;t forgotten elementary-school arithmetic. We&#8217;ve simply just fallen victim to <strong>Simpson&#8217;s Paradox<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Is Simpson&#8217;s Paradox<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Simpson&#8217;s Paradox is a statistical phenomenon in which a trend that appears clearly within several groups of data disappears \u2014 or even reverses \u2014 when the groups are combined into a single total.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the everyday practice of SEO and marketing, this almost always happens because of a hidden <strong>confounding variable<\/strong>: in our case, <strong>the relative weight of the segments we&#8217;re analyzing<\/strong>. It&#8217;s the same reasoning we meet when discussing <a href=\"https:\/\/www.gironi.it\/blog\/en\/contingency-tables-and-conditional-probability\/\">conditional probability<\/a>, where what matters is not the marginal figure but the one conditioned on a subgroup.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When we work with rates and percentages (CTR, conversion rate, bounce rate), looking at the aggregate figure without considering the underlying volumes is one of the most insidious traps for anyone analyzing data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Proof: Anatomy of a Fake Collapse<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s go back to our monthly report and put the absolute numbers behind those percentages. Only then can we understand what really happened between Month 1 and Month 2.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Segment<\/th><th>Month 1 (impr. \u00b7 clicks \u00b7 CTR)<\/th><th>Month 2 (impr. \u00b7 clicks \u00b7 CTR)<\/th><th>Trend<\/th><\/tr><\/thead><tbody><tr><td><strong>Desktop<\/strong><\/td><td>10,000 \u00b7 500 \u00b7 5.0%<\/td><td>10,000 \u00b7 550 \u00b7 5.5%<\/td><td>rising<\/td><\/tr><tr><td><strong>Mobile<\/strong><\/td><td>2,000 \u00b7 40 \u00b7 2.0%<\/td><td>20,000 \u00b7 500 \u00b7 2.5%<\/td><td>rising<\/td><\/tr><tr><td><em>Aggregate total<\/em><\/td><td>12,000 \u00b7 540 \u00b7 4.5%<\/td><td>30,000 \u00b7 1,050 \u00b7 3.5%<\/td><td>falling<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s the point: we don&#8217;t have an SEO problem \u2014 on the contrary, we&#8217;ve had a remarkable success. Our Mobile rankings have exploded, bringing in 18,000 more impressions than the previous month.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Mobile traffic, however, has historically had a structurally lower CTR than Desktop (more noise in the SERP, faster scrolling, distractions). That huge influx of low-CTR impressions &#8220;watered down&#8221; the global average, dragging it downward. The aggregate figure told us <em>&#8220;we&#8217;re getting worse&#8221;<\/em>; the disaggregated data tells us <em>&#8220;we&#8217;re improving across the board, but our traffic mix has changed&#8221;<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The mathematical reason is simple, and it&#8217;s worth keeping firmly in mind: <strong>the aggregate CTR is not the average of the segments&#8217; CTRs, but a <em>weighted<\/em> average of them<\/strong>, where the weights are each segment&#8217;s share of impressions. As a formula:<\/p>\n\n\n\n\\(\n\\text{CTR}_{\\text{agg}} = \\frac{\\sum_i \\text{clicks}_i}{\\sum_i \\text{impressions}_i} = \\sum_i w_i \\cdot \\text{CTR}_i, \\qquad w_i = \\frac{\\text{impressions}_i}{\\sum_j \\text{impressions}_j} \\\\\n\\)\n\n\n\n<p class=\"wp-block-paragraph\">where \\(\\text{CTR}_i\\) is the CTR of segment <em>i<\/em> and \\(w_i\\) is its weight, that is, the fraction of impressions it owns. In Month 2 the weight of Mobile went from 1\/6 to 2\/3 of the total: even though every individual CTR rose, the average shifted toward the (low) value of the segment that had become dominant. It&#8217;s not the math that has gone crazy: it&#8217;s the <em>mix<\/em> that has changed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s reconstruct the whole thing in R, so we can see the mechanism at work instead of taking it on faith:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Reconstruct the two months' data\ndf &lt;- data.frame(\n  segment     = c(\"Desktop\", \"Mobile\", \"Desktop\", \"Mobile\"),\n  month       = c(\"Month 1\",  \"Month 1\", \"Month 2\",  \"Month 2\"),\n  impressions = c(10000,      2000,      10000,      20000),\n  clicks      = c(500,        40,        550,        500)\n)\n\n# CTR of each segment\ndf$ctr &lt;- df$clicks \/ df$impressions\n\n# Aggregate CTR per month: a WEIGHTED average over impressions,\n# NOT the arithmetic mean of the CTRs\nagg &lt;- aggregate(cbind(clicks, impressions) ~ month, data = df, FUN = sum)\nagg$aggregate_ctr &lt;- agg$clicks \/ agg$impressions\nprint(agg)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As the output shows, the aggregate drops from 4.5% to 3.5% while both segments rise. N.B.: the arithmetic mean of Month 2&#8217;s two CTRs would be 4% (the simple average of 5.5% and 2.5%), quite different from the real 3.5%. The entire difference is in the weights.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Two More SEO Scenarios Where the Paradox Strikes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">CTR by device is the textbook example, but Simpson&#8217;s Paradox lurks just about everywhere in our dashboards.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. The Conversion Rate Collapse (Informational vs. Transactional Intent)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We&#8217;re working on an e-commerce site and the organic conversion rate goes from 3% to 1.5%. A disaster? Not necessarily. If we&#8217;ve just launched a corporate blog that has started ranking well for hundreds of informational, top-of-the-funnel keywords, we&#8217;ve brought thousands of users to the site who are far from the purchase stage (with a physiological CR close to 0.1%). The CR of our product pages may be stable or growing, but the sheer volume of blog traffic has distorted the aggregate average.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Cannibalization or Ranking Expansion?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">One of our long-standing product pages used to rank only for 5 exact transactional keywords: 100 impressions, 10 clicks, 10% CTR. We decide to optimize its content, and the next month Google rewards its semantics, ranking it for 80 new long-tail and related keywords. Now the page gets 5,000 impressions and 100 clicks: 2% CTR. If we look only at the page&#8217;s average CTR in Search Console, it seems our optimization destroyed it; if we look at the absolute clicks, we&#8217;ve multiplied them tenfold.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to Defend Yourself (Takeaways for the Analyst)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">How do we survive Simpson&#8217;s Paradox when presenting data to a client or stakeholder? Four precautions.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li><strong>Never trust the aggregate figure alone.<\/strong> When analyzing relative metrics (conversion rates, click rates, averages), the global total is often the least useful number of all.<\/li>\n\n\n<li><strong>Segment until you find homogeneity.<\/strong> Always split the data along logical dimensions before drawing conclusions: by device (Desktop\/Mobile), by query type (brand\/non-brand), and by page type (blog\/product).<\/li>\n\n\n<li><strong>Look for the shift in weights.<\/strong> If a global rate collapses but the subgroups hold steady, ask: <em>&#8220;has the traffic mix changed?&#8221;<\/em>. Almost always, a low-performing segment has suddenly increased its volumes.<\/li>\n\n\n<li><strong>Educate the client.<\/strong> In a report, don&#8217;t just show the CTR drop: show the disaggregated table. Explaining the mechanism doesn&#8217;t just save the monthly report \u2014 it positions us as analysts who reason about data rather than being at its mercy.<\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Data doesn&#8217;t lie, but aggregate data makes for an excellent magician. The most solid defense, however, isn&#8217;t statistical but experimental: when we get to decide <em>how<\/em> to assign traffic \u2014 randomizing users between two versions of a page \u2014 the mix stops being a variable beyond our control. That&#8217;s exactly what we do with a rigorously run <a href=\"https:\/\/www.gironi.it\/blog\/en\/guide-to-statistical-tests-for-a-b-analysis\/\">A\/B test<\/a>, the next step on our path: seeing how a controlled experiment neutralizes at the root the confounding variables that here we&#8217;ve merely unmasked.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"further-reading\">Further Reading<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If we want to dig deeper into Simpson&#8217;s Paradox and the art of reading data without being fooled, <a href=\"https:\/\/www.amazon.it\/dp\/8806246623?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>The Art of Statistics<\/em><\/a> by David Spiegelhalter is the right read: it devotes lucid pages to this very paradox \u2014 including the famous Berkeley admissions case \u2014 showing how an aggregate number can tell the exact opposite of what happened in the data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s the last day of the month. We&#8217;re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site&#8217;s overall organic CTR has collapsed from 4.5% to 3.5%. Before writing the bad-news email and bracing ourselves to justify the &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/simpsons-paradox-in-seo-when-aggregate-data-can-lie\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161,41],"tags":[],"class_list":["post-3579","post","type-post","status-publish","format-standard","hentry","category-statistics","category-seo"],"lang":"en","translations":{"en":3579,"it":3569},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"autore-articoli","author_link":"https:\/\/www.gironi.it\/blog\/author\/autore-articoli\/"},"uagb_comment_info":0,"uagb_excerpt":"It&#8217;s the last day of the month. We&#8217;re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site&#8217;s overall organic CTR has collapsed from 4.5% to 3.5%. Before writing the bad-news email and bracing ourselves to justify the&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3579"}],"version-history":[{"count":1,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3579\/revisions"}],"predecessor-version":[{"id":3580,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3579\/revisions\/3580"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}