Categories: seostatistics

The Statistics and SEO Library: the Books I Recommend (and Why)

There is a question that comes back, reliably, every time I publish an article along this path: “so, which book should I read to study these things?”. Until now I have answered one piece at a time, in the “Further Reading” section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.

This is not a ranking and not a catalogue: these are the books I actually use, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.

A note on transparency before we start: the links below are Amazon affiliate links. If you buy a book through them, the blog receives a small commission at no extra cost to you: it is the most painless way I have found to cover the server bills.

Where to Start

The Art of Statistics — David Spiegelhalter

If I could keep only one, it would be this. The Art of Statistics does not teach formulas: it teaches how to reason about data before trusting it, which is exactly the skill missing when someone reads a Search Console report and jumps to conclusions. Spiegelhalter — a Cambridge professor and a science communicator of rare clarity — builds every chapter around a real case: botched polls, misread medical statistics, the famous Berkeley admissions case (the same case I told when discussing Simpson’s Paradox).

I cite it practically everywhere on this blog: from sampling to confidence intervals, by way of the Central Limit Theorem. You can read it without pen and paper, and re-read it with profit. (For Italian readers there is also an excellent Italian edition, L’arte della statistica.)

Finalmente ho capito la statistica — Maurizio De Pra

The title says it all (“statistics, I finally got it”). Finalmente ho capito la statistica (Italian edition) is the book for absolute beginners who want a gradual path, plenty of examples and a modest price. It covers the territory of probability distributions well — the ones this blog’s path takes from the Poisson to the Beta — together with the foundations of probabilistic reasoning. It does not replace a textbook, but it does what a textbook cannot: it takes the fear away.

When Data Lies

How to Lie with Statistics — Darrell Huff

Written in 1954 and never aged. How to Lie with Statistics is the short, venomous catalogue of the tricks numbers can be made to play: biased samples, conveniently chosen averages, truncated chart axes, percentages stripped of their context. Huff wrote for newspaper readers; I recommend it to anyone reading SEO tool reports and vendor slide decks, where those very tricks are alive and well. If you have been through Simpson’s Paradox you already know that aggregate data can lie: Huff completes the picture with all the other ways.

You can read it in an afternoon, and from that afternoon on you never look at a chart the same way again. (Italian readers can find it as Mentire con le statistiche.)

The Textbook for Getting Serious: Inference

Statistica — Newbold, Carlson, Thorne

Sooner or later the moment comes when popular science is not enough: you want the applicability conditions of a test, the complete formulas, the exercises to check you understood. Statistica by Newbold, Carlson and Thorne (Italian edition) is the reference university textbook for the whole of inference: hypothesis testing, confidence intervals, chi-square, ANOVA — in practice, the theoretical backbone of my guide to statistical tests for A/B analysis.

Let me be frank: it is a university textbook, and it costs like one. But it is one of those books you buy once and consult for years.

Regression, Time Series, Models

Introduzione all’econometria — Stock, Watson

The name may be intimidating (econometrics?), but the content is exactly what anyone needs to go beyond basic linear regression: multiple regression, omitted variables, diagnostics, time series. Introduzione all’econometria by Stock and Watson (Italian edition; the English original is Introduction to Econometrics) has a quality that is rare in textbooks: a constant focus on the interpretation of results, not just their computation. Which is, after all, where the difference between a useful analysis and an exercise in style is decided.

The (Fallible) Art of Prediction

The Signal and the Noise — Nate Silver

Anyone working with data sooner or later has to make a forecast — and an estimate of next quarter’s organic traffic is a forecast in every respect. The Signal and the Noise tells the story of why predictions fail so often: too much faith in models, the temptation to mistake noise for signal, the inability to reason in probabilities. Silver — the man who called the 2012 US presidential election right in all fifty states — moves through poker, earthquakes, weather and finance, and along the way delivers the best narrative introduction to Bayesian reasoning I know of. It is the popular companion to the time series chapter: first you learn to build a forecast, then you learn to distrust it. (There is also an Italian edition: Il segnale e il rumore.)

Online Experimentation

Trustworthy Online Controlled Experiments — Kohavi, Tang, Xu

On A/B testing there is simply no equivalent: Trustworthy Online Controlled Experiments is the book on the subject, written by the people who led experimentation at Microsoft, Google and LinkedIn. Inside is everything I have touched in these articles — sample size, test power, mistakes to avoid — plus ten years of real-world cases about what goes wrong in actual experiments. I also used it to build my sample size calculator. Very readable.

The Bayesian Path

Bayesian Statistics the Fun Way — Will Kurt

Bayesian statistics has a reputation for being hard, and its textbooks do their best to confirm it. Bayesian Statistics the Fun Way does the opposite: Will Kurt explains priors, posteriors and Bayesian updating with examples taken from Star Wars and Lego bricks, and — something I particularly appreciate — uses R for the computational side, exactly as I do here. It is the right book for grasping the Bayesian logic (and the reason behind the Beta distribution) before tackling the formal theory.

Towards Machine Learning

An Introduction to Statistical Learning — James, Witten, Hastie, Tibshirani

The contemporary classic of statistical learning, known to everyone as “ISL”. An Introduction to Statistical Learning covers, with the right balance of intuition and formalism, the topics of the more advanced part of this path: logistic regression, decision trees, PCA, with hands-on labs in R. N.b.: the authors distribute the PDF for free from their website — the printed edition remains for those who, like me, prefer to annotate study books in pencil.

Introduction to Machine Learning — Ethem Alpaydın

For those who want the theoretical foundations of machine learning — the ones that in a university course would come before the labs — Introduction to Machine Learning by Alpaydın is the reference I cited in my introductory guide to ML. More formal than ISL: one to pick up after it, not instead of it.

The Working Language: R

R for Data Science — Wickham, Çetinkaya-Rundel, Grolemund

There was an obvious gap on this shelf: R code shows up in nearly every article of this blog — from the chi-square test to time series — but the book to learn the language from was missing. R for Data Science (second edition) fills the gap: Hadley Wickham is the author of the tidyverse, the package ecosystem that made R modern, and the book teaches the whole workflow — import, tidy, transform, visualise, communicate — on real data, with no superfluous theory. Like ISL, it can be read for free on the authors’ website: one more reason to have no excuses.

Communicating Data

Storytelling with Data — Cole Nussbaumer Knaflic

The most rigorous analysis in the world is worth little if the person receiving it does not understand it — and in marketing an analysis almost always has to be told to someone: a client, a manager, a meeting. Storytelling with Data teaches how to turn the default charts of Excel and Looker Studio into clear messages: choosing the right chart, removing the ink that carries no information, directing attention where it matters, building a narrative around the number. Of the whole shelf it is probably the book that pays for itself fastest: you can apply it to your very next report. (There is also an Italian edition, Data storytelling, published by Apogeo.)

A Niche Read

Monte Carlo Methods in Financial Engineering — Paul Glasserman

This is the most specialised book on the shelf, and I list it out of honesty towards anyone who has reached the Monte Carlo method and wants to go all the way: Monte Carlo Methods in Financial Engineering by Glasserman is the complete reference on simulation applied to finance. Not a beach read: it is the text you reach for when the others are no longer enough.

The Library at a Glance

To get your bearings quickly, here is the complete shelf in table form:

BookWho it’s forLanguage
The Art of Statistics — SpiegelhalterEveryone: the starting pointEN (also IT)
Finalmente ho capito la statistica — De PraAbsolute beginners, distributionsIT
How to Lie with Statistics — HuffDefending yourself from doctored numbersEN (also IT)
Statistica — Newbold, Carlson, ThorneFor rigour: inference and testsIT
Introduzione all’econometria — Stock, WatsonRegression and time seriesIT (orig. EN)
The Signal and the Noise — SilverWhy predictions failEN (also IT)
Trustworthy Online Controlled Experiments — Kohavi et al.A/B testing and experimentationEN
Bayesian Statistics the Fun Way — KurtThe Bayesian approach, with REN
An Introduction to Statistical Learning — James et al.Practical machine learning, with REN
Introduction to Machine Learning — AlpaydınTheoretical foundations of MLEN
R for Data Science — Wickham et al.Learning R, from raw data to chartsEN
Storytelling with Data — KnaflicCommunicating data and reportsEN (also IT)
Monte Carlo Methods in Financial Engineering — GlassermanAdvanced simulationEN

This shelf is not closed. As the blog’s path widens — the statistical paradoxes I have started to explore, the bootstrap, text analysis — the library will widen too, and this page will be updated accordingly. In the meantime, if one single recommendation had to suffice: start with Spiegelhalter, and let the articles on this blog be your gym.

Paolo Gironi

Recent Posts

Simpson’s Paradox in SEO: When Aggregate Data Can Lie

It's the last day of the month. We're putting together the SEO report for our…

2 weeks ago

Sampling and Sample Size: How Much Data Do You Really Need?

In this article: How to Choose Who to Measure: Types of Sampling Sample Size: The…

1 month ago

The Monte Carlo Method Explained Simply with Real-World Applications

What is the Monte Carlo method The story of the Monte Carlo method begins in…

3 months ago

The French Revolutionary Calendar

Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary…

3 months ago

A/B Test Sample Size Calculator

One of the most common questions when planning an A/B test is: how many users…

3 months ago

Understanding the Basics of Machine Learning: A Beginner’s Guide

Introduction Machine Learning is changing the way we see the world around us. From weather…

3 months ago