It’s the last day of the month. We’re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site’s overall organic CTR has collapsed from 4.5% to 3.5%.
Before writing the bad-news email and bracing ourselves to justify the drop, let’s do the right thing: disaggregate the data to understand where we’re losing ground. We look at performance by device and discover something seemingly impossible:
- CTR on Desktop rose from 5.0% to 5.5%.
- CTR on Mobile rose from 2.0% to 2.5%.
We stare at the screen. How is it mathematically possible that performance improved everywhere, yet the overall total dropped by a full percentage point?
We haven’t broken Google Search Console, and we haven’t forgotten elementary-school arithmetic. We’ve simply just fallen victim to Simpson’s Paradox.
What Is Simpson’s Paradox
Simpson’s Paradox is a statistical phenomenon in which a trend that appears clearly within several groups of data disappears — or even reverses — when the groups are combined into a single total.
In the everyday practice of SEO and marketing, this almost always happens because of a hidden confounding variable: in our case, the relative weight of the segments we’re analyzing. It’s the same reasoning we meet when discussing conditional probability, where what matters is not the marginal figure but the one conditioned on a subgroup.
When we work with rates and percentages (CTR, conversion rate, bounce rate), looking at the aggregate figure without considering the underlying volumes is one of the most insidious traps for anyone analyzing data.
The Proof: Anatomy of a Fake Collapse
Let’s go back to our monthly report and put the absolute numbers behind those percentages. Only then can we understand what really happened between Month 1 and Month 2.
| Segment | Month 1 (impr. · clicks · CTR) | Month 2 (impr. · clicks · CTR) | Trend |
|---|---|---|---|
| Desktop | 10,000 · 500 · 5.0% | 10,000 · 550 · 5.5% | rising |
| Mobile | 2,000 · 40 · 2.0% | 20,000 · 500 · 2.5% | rising |
| Aggregate total | 12,000 · 540 · 4.5% | 30,000 · 1,050 · 3.5% | falling |
Here’s the point: we don’t have an SEO problem — on the contrary, we’ve had a remarkable success. Our Mobile rankings have exploded, bringing in 18,000 more impressions than the previous month.
Mobile traffic, however, has historically had a structurally lower CTR than Desktop (more noise in the SERP, faster scrolling, distractions). That huge influx of low-CTR impressions “watered down” the global average, dragging it downward. The aggregate figure told us “we’re getting worse”; the disaggregated data tells us “we’re improving across the board, but our traffic mix has changed”.
The mathematical reason is simple, and it’s worth keeping firmly in mind: the aggregate CTR is not the average of the segments’ CTRs, but a weighted average of them, where the weights are each segment’s share of impressions. As a formula:
\( \text{CTR}_{\text{agg}} = \frac{\sum_i \text{clicks}_i}{\sum_i \text{impressions}_i} = \sum_i w_i \cdot \text{CTR}_i, \qquad w_i = \frac{\text{impressions}_i}{\sum_j \text{impressions}_j} \\ \)where \(\text{CTR}_i\) is the CTR of segment i and \(w_i\) is its weight, that is, the fraction of impressions it owns. In Month 2 the weight of Mobile went from 1/6 to 2/3 of the total: even though every individual CTR rose, the average shifted toward the (low) value of the segment that had become dominant. It’s not the math that has gone crazy: it’s the mix that has changed.
Let’s reconstruct the whole thing in R, so we can see the mechanism at work instead of taking it on faith:
# Reconstruct the two months' data
df <- data.frame(
segment = c("Desktop", "Mobile", "Desktop", "Mobile"),
month = c("Month 1", "Month 1", "Month 2", "Month 2"),
impressions = c(10000, 2000, 10000, 20000),
clicks = c(500, 40, 550, 500)
)
# CTR of each segment
df$ctr <- df$clicks / df$impressions
# Aggregate CTR per month: a WEIGHTED average over impressions,
# NOT the arithmetic mean of the CTRs
agg <- aggregate(cbind(clicks, impressions) ~ month, data = df, FUN = sum)
agg$aggregate_ctr <- agg$clicks / agg$impressions
print(agg)As the output shows, the aggregate drops from 4.5% to 3.5% while both segments rise. N.B.: the arithmetic mean of Month 2’s two CTRs would be 4% (the simple average of 5.5% and 2.5%), quite different from the real 3.5%. The entire difference is in the weights.
Two More SEO Scenarios Where the Paradox Strikes
CTR by device is the textbook example, but Simpson’s Paradox lurks just about everywhere in our dashboards.
1. The Conversion Rate Collapse (Informational vs. Transactional Intent)
We’re working on an e-commerce site and the organic conversion rate goes from 3% to 1.5%. A disaster? Not necessarily. If we’ve just launched a corporate blog that has started ranking well for hundreds of informational, top-of-the-funnel keywords, we’ve brought thousands of users to the site who are far from the purchase stage (with a physiological CR close to 0.1%). The CR of our product pages may be stable or growing, but the sheer volume of blog traffic has distorted the aggregate average.
2. Cannibalization or Ranking Expansion?
One of our long-standing product pages used to rank only for 5 exact transactional keywords: 100 impressions, 10 clicks, 10% CTR. We decide to optimize its content, and the next month Google rewards its semantics, ranking it for 80 new long-tail and related keywords. Now the page gets 5,000 impressions and 100 clicks: 2% CTR. If we look only at the page’s average CTR in Search Console, it seems our optimization destroyed it; if we look at the absolute clicks, we’ve multiplied them tenfold.
How to Defend Yourself (Takeaways for the Analyst)
How do we survive Simpson’s Paradox when presenting data to a client or stakeholder? Four precautions.
- Never trust the aggregate figure alone. When analyzing relative metrics (conversion rates, click rates, averages), the global total is often the least useful number of all.
- Segment until you find homogeneity. Always split the data along logical dimensions before drawing conclusions: by device (Desktop/Mobile), by query type (brand/non-brand), and by page type (blog/product).
- Look for the shift in weights. If a global rate collapses but the subgroups hold steady, ask: “has the traffic mix changed?”. Almost always, a low-performing segment has suddenly increased its volumes.
- Educate the client. In a report, don’t just show the CTR drop: show the disaggregated table. Explaining the mechanism doesn’t just save the monthly report — it positions us as analysts who reason about data rather than being at its mercy.
Data doesn’t lie, but aggregate data makes for an excellent magician. The most solid defense, however, isn’t statistical but experimental: when we get to decide how to assign traffic — randomizing users between two versions of a page — the mix stops being a variable beyond our control. That’s exactly what we do with a rigorously run A/B test, the next step on our path: seeing how a controlled experiment neutralizes at the root the confounding variables that here we’ve merely unmasked.
Further Reading
If we want to dig deeper into Simpson’s Paradox and the art of reading data without being fooled, The Art of Statistics by David Spiegelhalter is the right read: it devotes lucid pages to this very paradox — including the famous Berkeley admissions case — showing how an aggregate number can tell the exact opposite of what happened in the data.