There is a question that comes back, reliably, every time I publish an article along this path: “so, which book should I read to study these things?”. Until now I have answered one piece at a time, in the “Further Reading” section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.
This is not a ranking and not a catalogue: these are the books I actually use, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.
A note on transparency before we start: the links below are Amazon affiliate links. If you buy a book through them, the blog receives a small commission at no extra cost to you: it is the most painless way I have found to cover the server bills.
If I could keep only one, it would be this. The Art of Statistics does not teach formulas: it teaches how to reason about data before trusting it, which is exactly the skill missing when someone reads a Search Console report and jumps to conclusions. Spiegelhalter — a Cambridge professor and a science communicator of rare clarity — builds every chapter around a real case: botched polls, misread medical statistics, the famous Berkeley admissions case (the same case I told when discussing Simpson’s Paradox).
I cite it practically everywhere on this blog: from sampling to confidence intervals, by way of the Central Limit Theorem. You can read it without pen and paper, and re-read it with profit. (For Italian readers there is also an excellent Italian edition, L’arte della statistica.)
The title says it all (“statistics, I finally got it”). Finalmente ho capito la statistica (Italian edition) is the book for absolute beginners who want a gradual path, plenty of examples and a modest price. It covers the territory of probability distributions well — the ones this blog’s path takes from the Poisson to the Beta — together with the foundations of probabilistic reasoning. It does not replace a textbook, but it does what a textbook cannot: it takes the fear away.
Written in 1954 and never aged. How to Lie with Statistics is the short, venomous catalogue of the tricks numbers can be made to play: biased samples, conveniently chosen averages, truncated chart axes, percentages stripped of their context. Huff wrote for newspaper readers; I recommend it to anyone reading SEO tool reports and vendor slide decks, where those very tricks are alive and well. If you have been through Simpson’s Paradox you already know that aggregate data can lie: Huff completes the picture with all the other ways.
You can read it in an afternoon, and from that afternoon on you never look at a chart the same way again. (Italian readers can find it as Mentire con le statistiche.)
Sooner or later the moment comes when popular science is not enough: you want the applicability conditions of a test, the complete formulas, the exercises to check you understood. Statistica by Newbold, Carlson and Thorne (Italian edition) is the reference university textbook for the whole of inference: hypothesis testing, confidence intervals, chi-square, ANOVA — in practice, the theoretical backbone of my guide to statistical tests for A/B analysis.
Let me be frank: it is a university textbook, and it costs like one. But it is one of those books you buy once and consult for years.
The name may be intimidating (econometrics?), but the content is exactly what anyone needs to go beyond basic linear regression: multiple regression, omitted variables, diagnostics, time series. Introduzione all’econometria by Stock and Watson (Italian edition; the English original is Introduction to Econometrics) has a quality that is rare in textbooks: a constant focus on the interpretation of results, not just their computation. Which is, after all, where the difference between a useful analysis and an exercise in style is decided.
Anyone working with data sooner or later has to make a forecast — and an estimate of next quarter’s organic traffic is a forecast in every respect. The Signal and the Noise tells the story of why predictions fail so often: too much faith in models, the temptation to mistake noise for signal, the inability to reason in probabilities. Silver — the man who called the 2012 US presidential election right in all fifty states — moves through poker, earthquakes, weather and finance, and along the way delivers the best narrative introduction to Bayesian reasoning I know of. It is the popular companion to the time series chapter: first you learn to build a forecast, then you learn to distrust it. (There is also an Italian edition: Il segnale e il rumore.)
On A/B testing there is simply no equivalent: Trustworthy Online Controlled Experiments is the book on the subject, written by the people who led experimentation at Microsoft, Google and LinkedIn. Inside is everything I have touched in these articles — sample size, test power, mistakes to avoid — plus ten years of real-world cases about what goes wrong in actual experiments. I also used it to build my sample size calculator. Very readable.
Bayesian statistics has a reputation for being hard, and its textbooks do their best to confirm it. Bayesian Statistics the Fun Way does the opposite: Will Kurt explains priors, posteriors and Bayesian updating with examples taken from Star Wars and Lego bricks, and — something I particularly appreciate — uses R for the computational side, exactly as I do here. It is the right book for grasping the Bayesian logic (and the reason behind the Beta distribution) before tackling the formal theory.
The contemporary classic of statistical learning, known to everyone as “ISL”. An Introduction to Statistical Learning covers, with the right balance of intuition and formalism, the topics of the more advanced part of this path: logistic regression, decision trees, PCA, with hands-on labs in R. N.b.: the authors distribute the PDF for free from their website — the printed edition remains for those who, like me, prefer to annotate study books in pencil.
For those who want the theoretical foundations of machine learning — the ones that in a university course would come before the labs — Introduction to Machine Learning by Alpaydın is the reference I cited in my introductory guide to ML. More formal than ISL: one to pick up after it, not instead of it.
There was an obvious gap on this shelf: R code shows up in nearly every article of this blog — from the chi-square test to time series — but the book to learn the language from was missing. R for Data Science (second edition) fills the gap: Hadley Wickham is the author of the tidyverse, the package ecosystem that made R modern, and the book teaches the whole workflow — import, tidy, transform, visualise, communicate — on real data, with no superfluous theory. Like ISL, it can be read for free on the authors’ website: one more reason to have no excuses.
The most rigorous analysis in the world is worth little if the person receiving it does not understand it — and in marketing an analysis almost always has to be told to someone: a client, a manager, a meeting. Storytelling with Data teaches how to turn the default charts of Excel and Looker Studio into clear messages: choosing the right chart, removing the ink that carries no information, directing attention where it matters, building a narrative around the number. Of the whole shelf it is probably the book that pays for itself fastest: you can apply it to your very next report. (There is also an Italian edition, Data storytelling, published by Apogeo.)
This is the most specialised book on the shelf, and I list it out of honesty towards anyone who has reached the Monte Carlo method and wants to go all the way: Monte Carlo Methods in Financial Engineering by Glasserman is the complete reference on simulation applied to finance. Not a beach read: it is the text you reach for when the others are no longer enough.
To get your bearings quickly, here is the complete shelf in table form:
| Book | Who it’s for | Language |
|---|---|---|
| The Art of Statistics — Spiegelhalter | Everyone: the starting point | EN (also IT) |
| Finalmente ho capito la statistica — De Pra | Absolute beginners, distributions | IT |
| How to Lie with Statistics — Huff | Defending yourself from doctored numbers | EN (also IT) |
| Statistica — Newbold, Carlson, Thorne | For rigour: inference and tests | IT |
| Introduzione all’econometria — Stock, Watson | Regression and time series | IT (orig. EN) |
| The Signal and the Noise — Silver | Why predictions fail | EN (also IT) |
| Trustworthy Online Controlled Experiments — Kohavi et al. | A/B testing and experimentation | EN |
| Bayesian Statistics the Fun Way — Kurt | The Bayesian approach, with R | EN |
| An Introduction to Statistical Learning — James et al. | Practical machine learning, with R | EN |
| Introduction to Machine Learning — Alpaydın | Theoretical foundations of ML | EN |
| R for Data Science — Wickham et al. | Learning R, from raw data to charts | EN |
| Storytelling with Data — Knaflic | Communicating data and reports | EN (also IT) |
| Monte Carlo Methods in Financial Engineering — Glasserman | Advanced simulation | EN |
This shelf is not closed. As the blog’s path widens — the statistical paradoxes I have started to explore, the bootstrap, text analysis — the library will widen too, and this page will be updated accordingly. In the meantime, if one single recommendation had to suffice: start with Spiegelhalter, and let the articles on this blog be your gym.
It's the last day of the month. We're putting together the SEO report for our…
In this article: How to Choose Who to Measure: Types of Sampling Sample Size: The…
What is the Monte Carlo method The story of the Monte Carlo method begins in…
Date Converter Use the converter to transform any Gregorian date into the corresponding French Revolutionary…
One of the most common questions when planning an A/B test is: how many users…
Introduction Machine Learning is changing the way we see the world around us. From weather…