The Statistics and SEO Library: the Books I Recommend (and Why)

There is a question that comes back, reliably, every time I publish an article along this path: “so, which book should I read to study these things?”. Until now I have answered one piece at a time, in the “Further Reading” section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.

This is not a ranking and not a catalogue: these are the books I actually use, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.

Continue reading “The Statistics and SEO Library: the Books I Recommend (and Why)”

Simpson’s Paradox in SEO: When Aggregate Data Can Lie

It’s the last day of the month. We’re putting together the SEO report for our main client. We open Google Search Console, set the month-over-month comparison, and a chill runs down our spine: the site’s overall organic CTR has collapsed from 4.5% to 3.5%.

Before writing the bad-news email and bracing ourselves to justify the drop, let’s do the right thing: disaggregate the data to understand where we’re losing ground. We look at performance by device and discover something seemingly impossible:

  • CTR on Desktop rose from 5.0% to 5.5%.
  • CTR on Mobile rose from 2.0% to 2.5%.

We stare at the screen. How is it mathematically possible that performance improved everywhere, yet the overall total dropped by a full percentage point?

Continue reading “Simpson’s Paradox in SEO: When Aggregate Data Can Lie”

Time Series Analysis and Forecasting in R

What is meant by a time series?

A time series consists of values observed over a set of sequentially ordered periods. This, for those who do SEO, is already an element of utmost interest.

Website traffic data, considered over a time sequence, is in fact an example of a time series.

Time series analysis is a set of methods that allow us to derive significant patterns or statistics from data with temporal information.

In very general terms, we can say that a time series is a sequence of random variables indexed in time.

The purpose of analyzing a time series can be descriptive (consider decomposing the series to remove seasonality elements or to highlight underlying trends) or inferential, with the latter including forecasting values for future time periods that have not yet occurred.

Continue reading “Time Series Analysis and Forecasting in R”