statistics

The Statistics and SEO Library: the Books I Recommend (and Why)

There is a question that comes back, reliably, every time I publish an article along this path: “so, which book should I read to study these things?”. Until now I have answered one piece at a time, in the “Further Reading” section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.

This is not a ranking and not a catalogue: these are the books I actually use, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.

A note on transparency before we start: the links below are Amazon affiliate links. If you buy a book through them, the blog receives a small commission at no extra cost to you: it is the most painless way I have found to cover the server bills.

Where to Start

The Art of Statistics — David Spiegelhalter

If I could keep only one, it would be this. The Art of Statistics does not teach formulas: it teaches how to reason about data before trusting it, which is exactly the skill missing when someone reads a Search Console report and jumps to conclusions. Spiegelhalter — a Cambridge professor and a science communicator of rare clarity — builds every chapter around a real case: botched polls, misread medical statistics, the famous Berkeley admissions case (the same case I told when discussing Simpson’s Paradox).

I cite it practically everywhere on this blog: from sampling to confidence intervals, by way of the Central Limit Theorem. You can read it without pen and paper, and re-read it with profit. (For Italian readers there is also an excellent Italian edition, L’arte della statistica.)

Finalmente ho capito la statistica — Maurizio De Pra

The title says it all (“statistics, I finally got it”). Finalmente ho capito la statistica (Italian edition) is the book for absolute beginners who want a gradual path, plenty of examples and a modest price. It covers the territory of probability distributions well — the ones this blog’s path takes from the Poisson to the Beta — together with the foundations of probabilistic reasoning. It does not replace a textbook, but it does what a textbook cannot: it takes the fear away.

When Data Lies

How to Lie with Statistics — Darrell Huff

Written in 1954 and never aged. How to Lie with Statistics is the short, venomous catalogue of the tricks numbers can be made to play: biased samples, conveniently chosen averages, truncated chart axes, percentages stripped of their context. Huff wrote for newspaper readers; I recommend it to anyone reading SEO tool reports and vendor slide decks, where those very tricks are alive and well. If you have been through Simpson’s Paradox you already know that aggregate data can lie: Huff completes the picture with all the other ways.

You can read it in an afternoon, and from that afternoon on you never look at a chart the same way again. (Italian readers can find it as Mentire con le statistiche.)

The Textbook for Getting Serious: Inference

Statistica — Newbold, Carlson, Thorne

Sooner or later the moment comes when popular science is not enough: you want the applicability conditions of a test, the complete formulas, the exercises to check you understood. Statistica by Newbold, Carlson and Thorne (Italian edition) is the reference university textbook for the whole of inference: hypothesis testing, confidence intervals, chi-square, ANOVA — in practice, the theoretical backbone of my guide to statistical tests for A/B analysis.

Let me be frank: it is a university textbook, and it costs like one. But it is one of those books you buy once and consult for years.

Regression, Time Series, Models

Introduzione all’econometria — Stock, Watson

The name may be intimidating (econometrics?), but the content is exactly what anyone needs to go beyond basic linear regression: multiple regression, omitted variables, diagnostics, time series. Introduzione all’econometria by Stock and Watson (Italian edition; the English original is Introduction to Econometrics) has a quality that is rare in textbooks: a constant focus on the interpretation of results, not just their computation. Which is, after all, where the difference between a useful analysis and an exercise in style is decided.

The (Fallible) Art of Prediction

The Signal and the Noise — Nate Silver

Anyone working with data sooner or later has to make a forecast — and an estimate of next quarter’s organic traffic is a forecast in every respect. The Signal and the Noise tells the story of why predictions fail so often: too much faith in models, the temptation to mistake noise for signal, the inability to reason in probabilities. Silver — the man who called the 2012 US presidential election right in all fifty states — moves through poker, earthquakes, weather and finance, and along the way delivers the best narrative introduction to Bayesian reasoning I know of. It is the popular companion to the time series chapter: first you learn to build a forecast, then you learn to distrust it. (There is also an Italian edition: Il segnale e il rumore.)

Online Experimentation

Trustworthy Online Controlled Experiments — Kohavi, Tang, Xu

On A/B testing there is simply no equivalent: Trustworthy Online Controlled Experiments is the book on the subject, written by the people who led experimentation at Microsoft, Google and LinkedIn. Inside is everything I have touched in these articles — sample size, test power, mistakes to avoid — plus ten years of real-world cases about what goes wrong in actual experiments. I also used it to build my sample size calculator. Very readable.

The Bayesian Path

Bayesian Statistics the Fun Way — Will Kurt

Bayesian statistics has a reputation for being hard, and its textbooks do their best to confirm it. Bayesian Statistics the Fun Way does the opposite: Will Kurt explains priors, posteriors and Bayesian updating with examples taken from Star Wars and Lego bricks, and — something I particularly appreciate — uses R for the computational side, exactly as I do here. It is the right book for grasping the Bayesian logic (and the reason behind the Beta distribution) before tackling the formal theory.

Towards Machine Learning

An Introduction to Statistical Learning — James, Witten, Hastie, Tibshirani

The contemporary classic of statistical learning, known to everyone as “ISL”. An Introduction to Statistical Learning covers, with the right balance of intuition and formalism, the topics of the more advanced part of this path: logistic regression, decision trees, PCA, with hands-on labs in R. N.b.: the authors distribute the PDF for free from their website — the printed edition remains for those who, like me, prefer to annotate study books in pencil.

Introduction to Machine Learning — Ethem Alpaydın

For those who want the theoretical foundations of machine learning — the ones that in a university course would come before the labs — Introduction to Machine Learning by Alpaydın is the reference I cited in my introductory guide to ML. More formal than ISL: one to pick up after it, not instead of it.

The Working Language: R

R for Data Science — Wickham, Çetinkaya-Rundel, Grolemund

There was an obvious gap on this shelf: R code shows up in nearly every article of this blog — from the chi-square test to time series — but the book to learn the language from was missing. R for Data Science (second edition) fills the gap: Hadley Wickham is the author of the tidyverse, the package ecosystem that made R modern, and the book teaches the whole workflow — import, tidy, transform, visualise, communicate — on real data, with no superfluous theory. Like ISL, it can be read for free on the authors’ website: one more reason to have no excuses.

Communicating Data

Storytelling with Data — Cole Nussbaumer Knaflic

The most rigorous analysis in the world is worth little if the person receiving it does not understand it — and in marketing an analysis almost always has to be told to someone: a client, a manager, a meeting. Storytelling with Data teaches how to turn the default charts of Excel and Looker Studio into clear messages: choosing the right chart, removing the ink that carries no information, directing attention where it matters, building a narrative around the number. Of the whole shelf it is probably the book that pays for itself fastest: you can apply it to your very next report. (There is also an Italian edition, Data storytelling, published by Apogeo.)

A Niche Read

Monte Carlo Methods in Financial Engineering — Paul Glasserman

This is the most specialised book on the shelf, and I list it out of honesty towards anyone who has reached the Monte Carlo method and wants to go all the way: Monte Carlo Methods in Financial Engineering by Glasserman is the complete reference on simulation applied to finance. Not a beach read: it is the text you reach for when the others are no longer enough.

The Library at a Glance

To get your bearings quickly, here is the complete shelf in table form:

Book	Who it’s for	Language
The Art of Statistics — Spiegelhalter	Everyone: the starting point	EN (also IT)
Finalmente ho capito la statistica — De Pra	Absolute beginners, distributions	IT
How to Lie with Statistics — Huff	Defending yourself from doctored numbers	EN (also IT)
Statistica — Newbold, Carlson, Thorne	For rigour: inference and tests	IT
Introduzione all’econometria — Stock, Watson	Regression and time series	IT (orig. EN)
The Signal and the Noise — Silver	Why predictions fail	EN (also IT)
Trustworthy Online Controlled Experiments — Kohavi et al.	A/B testing and experimentation	EN
Bayesian Statistics the Fun Way — Kurt	The Bayesian approach, with R	EN
An Introduction to Statistical Learning — James et al.	Practical machine learning, with R	EN
Introduction to Machine Learning — Alpaydın	Theoretical foundations of ML	EN
R for Data Science — Wickham et al.	Learning R, from raw data to charts	EN
Storytelling with Data — Knaflic	Communicating data and reports	EN (also IT)
Monte Carlo Methods in Financial Engineering — Glasserman	Advanced simulation	EN

This shelf is not closed. As the blog’s path widens — the statistical paradoxes I have started to explore, the bootstrap, text analysis — the library will widen too, and this page will be updated accordingly. In the meantime, if one single recommendation had to suffice: start with Spiegelhalter, and let the articles on this blog be your gym.

Paolo Gironi