Simpson's Paradox in SEO: When Aggregate Data Can Lie

It’s the last day of the month. We’re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site’s overall organic CTR has collapsed from 4.5% to 3.5%.

Before writing the bad-news email and bracing ourselves to justify the drop, let’s do the right thing: disaggregate the data to understand where we’re losing ground. We look at performance by device and discover something seemingly impossible:

CTR on Desktop rose from 5.0% to 5.5%.
CTR on Mobile rose from 2.0% to 2.5%.

We stare at the screen. How is it mathematically possible that performance improved everywhere, yet the overall total dropped by a full percentage point?

We haven’t broken Google Search Console, and we haven’t forgotten elementary-school arithmetic. We’ve simply just fallen victim to Simpson’s Paradox.

What we’ll cover:

What Is Simpson’s Paradox
The Proof: Anatomy of a Fake Collapse
Two More SEO Scenarios Where the Paradox Strikes
How to Defend Yourself
Further Reading

What Is Simpson’s Paradox

Simpson’s Paradox is a statistical phenomenon in which a trend that appears clearly within several groups of data disappears — or even reverses — when the groups are combined into a single total.

In the everyday practice of SEO and marketing, this almost always happens because of a hidden confounding variable: in our case, the relative weight of the segments we’re analyzing. It’s the same reasoning we meet when discussing conditional probability, where what matters is not the marginal figure but the one conditioned on a subgroup. At bottom, the paradox doesn’t arise because the averages are wrong, but because we’re comparing two snapshots of a population whose composition has changed in the meantime.

When we work with rates and percentages (CTR, conversion rate, bounce rate), looking at the aggregate figure without considering the underlying volumes is one of the most insidious traps for anyone analyzing data.

The Proof: Anatomy of a Fake Collapse

Let’s go back to our monthly report and put the absolute numbers behind those percentages. Only then can we understand what really happened between Month 1 and Month 2.

Segment	Month 1 (impr. · clicks · CTR)	Month 2 (impr. · clicks · CTR)	Trend
Desktop	10,000 · 500 · 5.0%	10,000 · 550 · 5.5%	rising
Mobile	2,000 · 40 · 2.0%	20,000 · 500 · 2.5%	rising
Aggregate total	12,000 · 540 · 4.5%	30,000 · 1,050 · 3.5%	falling

Here’s the point: we don’t have an SEO problem — on the contrary, we’ve had a remarkable success. Our Mobile rankings have exploded, bringing in 18,000 more impressions than the previous month.

Mobile traffic, however, has historically had a structurally lower CTR than Desktop (more noise in the SERP, faster scrolling, distractions). That huge influx of low-CTR impressions “watered down” the global average, dragging it downward. The aggregate figure told us “we’re getting worse”; the disaggregated data tells us “we’re improving across the board, but our traffic mix has changed”.

Slopegraph of the two months' CTRs: Desktop and Mobile both rise, while the aggregate line falls. The paradox at a glance: no line contradicts another, only their weight changes. — Slopegraph of the two months’ CTRs: Desktop and Mobile both rise, while the aggregate line falls. The paradox at a glance: no line contradicts another, only their weight changes.

The mathematical reason is simple, and it’s worth keeping firmly in mind: the aggregate CTR is not the average of the segments’ CTRs, but a weighted average of them, where the weights are each segment’s share of impressions. As a formula:

\( \text{CTR}_{\text{agg}} = \frac{\sum_i \text{clicks}_i}{\sum_i \text{impressions}_i} = \sum_i w_i \cdot \text{CTR}_i, \qquad w_i = \frac{\text{impressions}_i}{\sum_j \text{impressions}_j} \\ \)

where \(\text{CTR}_i\) is the CTR of segment i and \(w_i\) is its weight, that is, the fraction of impressions it owns. In Month 2 the weight of Mobile went from 1/6 to 2/3 of the total: even though every individual CTR rose, the average shifted toward the (low) value of the segment that had become dominant. It’s not the math that has gone crazy: it’s the mix that has changed.

The composition of impressions across the two months: Desktop stays at 10,000 while Mobile explodes from 2,000 to 20,000. Its weight goes from 1/6 (16.7%) to 2/3 (66.7%) of the total, pulling the weighted average toward its own low CTR.

Let’s reconstruct the whole thing in R, so we can see the mechanism at work instead of taking it on faith:

# Reconstruct the two months' data
df <- data.frame(
  segment     = c("Desktop", "Mobile", "Desktop", "Mobile"),
  month       = c("Month 1",  "Month 1", "Month 2",  "Month 2"),
  impressions = c(10000,      2000,      10000,      20000),
  clicks      = c(500,        40,        550,        500)
)

# CTR of each segment
df$ctr <- df$clicks / df$impressions

# Aggregate CTR per month: a WEIGHTED average over impressions,
# NOT the arithmetic mean of the CTRs
agg <- aggregate(cbind(clicks, impressions) ~ month, data = df, FUN = sum)
agg$aggregate_ctr <- agg$clicks / agg$impressions
print(agg)

As the output shows, the aggregate drops from 4.5% to 3.5% while both segments rise. N.B.: the arithmetic mean of Month 2’s two CTRs would be 4% (the simple average of 5.5% and 2.5%), quite different from the real 3.5%. The entire difference is in the weights.

Desktop CTR ↑ Mobile CTR ↑

↓

Mobile’s share of the mix ↑↑↑

↓

aggregate CTR ↓

Two More SEO Scenarios Where the Paradox Strikes

CTR by device is the textbook example, but Simpson’s Paradox lurks just about everywhere in our dashboards.

1. The Conversion Rate Collapse (Informational vs. Transactional Intent)

We’re working on an e-commerce site and the organic conversion rate goes from 3% to 1.5%. A disaster? Not necessarily. If we’ve just launched a corporate blog that has started ranking well for hundreds of informational, top-of-the-funnel keywords, we’ve brought thousands of users to the site who are far from the purchase stage (with a physiological CR close to 0.1%). The CR of our product pages may be stable or growing, but the sheer volume of blog traffic has distorted the aggregate average.

2. Cannibalization or Ranking Expansion?

One of our long-standing product pages used to rank only for 5 exact transactional keywords: 100 impressions, 10 clicks, 10% CTR. We decide to optimize its content, and the next month Google rewards its semantics, ranking it for 80 new long-tail and related keywords. Now the page gets 5,000 impressions and 100 clicks: 2% CTR. If we look only at the page’s average CTR in Search Console, it seems our optimization destroyed it; if we look at the absolute clicks, we’ve multiplied them tenfold.

How to Defend Yourself (Takeaways for the Analyst)

How do we survive Simpson’s Paradox when presenting data to a client or stakeholder? Four precautions.

Never trust the aggregate figure alone. When analyzing relative metrics (conversion rates, click rates, averages), the global total is often the least useful number of all.
Segment until you find homogeneity. Always split the data along logical dimensions before drawing conclusions: by device (Desktop/Mobile), by query type (brand/non-brand), and by page type (blog/product).
Look for the shift in weights. If a global rate collapses but the subgroups hold steady, ask: “has the traffic mix changed?”. Almost always, a low-performing segment has suddenly increased its volumes.
Educate the client. In a report, don’t just show the CTR drop: show the disaggregated table. Explaining the mechanism doesn’t just save the monthly report — it positions us as analysts who reason about data rather than being at its mercy.

There’s a single rule worth taking home, and it reaches well beyond this paradox: before interpreting an average, always ask which groups it’s made of and whether their weights have shifted. An average without its composition is a number that can say anything and its opposite.

Data doesn’t lie, but aggregate data makes for an excellent magician. The most solid defense, however, isn’t statistical but experimental: when we get to decide how to assign traffic — randomizing users between two versions of a page — the mix stops being a variable beyond our control. That’s exactly what we do with a rigorously run A/B test, the next step on our path: seeing how a controlled experiment neutralizes at the root the confounding variables that here we’ve merely unmasked.

Simpson’s Paradox in SEO: When Aggregate Data Can Lie