  {"id":3652,"date":"2026-06-11T09:11:56","date_gmt":"2026-06-11T08:11:56","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3652"},"modified":"2026-06-11T16:18:01","modified_gmt":"2026-06-11T15:18:01","slug":"statistics-seo-library","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/statistics-seo-library\/","title":{"rendered":"The Statistics and SEO Library: the Books I Recommend (and Why)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">There is a question that comes back, reliably, every time I publish an article along this path: <em>&#8220;so, which book should I read to study these things?&#8221;<\/em>. Until now I have answered one piece at a time, in the &#8220;Further Reading&#8221; section that closes each article. Here I do the reverse: I gather the whole library on a single page, with the reason each title earned its place on the shelf.<\/p>\n\n\n<p class=\"wp-block-paragraph\">This is not a ranking and not a catalogue: these are <strong>the books I actually use<\/strong>, the ones many of the examples and explanations in the articles come from. Few of them, chosen with a simple criterion: each book must let anyone working with data in SEO and marketing take one concrete step forward, without requiring a degree in mathematics.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">A note on transparency before we start: the links below are Amazon affiliate links. If you buy a book through them, the blog receives a small commission at no extra cost to you: it is the most painless way I have found to cover the server bills.<\/p>\n\n\n<h2 class=\"wp-block-heading\">Where to Start<\/h2>\n\n\n<h3 class=\"wp-block-heading\">The Art of Statistics \u2014 David Spiegelhalter<\/h3>\n\n\n<p class=\"wp-block-paragraph\">If I could keep only one, it would be this. <a href=\"https:\/\/www.amazon.it\/dp\/0241258766?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>The Art of Statistics<\/em><\/a> does not teach formulas: it teaches <strong>how to reason about data before trusting it<\/strong>, which is exactly the skill missing when someone reads a Search Console report and jumps to conclusions. Spiegelhalter \u2014 a Cambridge professor and a science communicator of rare clarity \u2014 builds every chapter around a real case: botched polls, misread medical statistics, the famous Berkeley admissions case (the same case I told when discussing <a href=\"https:\/\/www.gironi.it\/blog\/en\/simpsons-paradox-in-seo-when-aggregate-data-can-lie\/\">Simpson&#8217;s Paradox<\/a>).<\/p>\n\n\n<p class=\"wp-block-paragraph\">I cite it practically everywhere on this blog: from <a href=\"https:\/\/www.gironi.it\/blog\/en\/sampling-and-sample-size-how-much-data-do-you-really-need\/\">sampling<\/a> to <a href=\"https:\/\/www.gironi.it\/blog\/en\/confidence-intervals-what-they-are-how-to-calculate-them-and-what-they-do-not-mean\/\">confidence intervals<\/a>, by way of the <a href=\"https:\/\/www.gironi.it\/blog\/en\/central-limit-theorem\/\">Central Limit Theorem<\/a>. You can read it without pen and paper, and re-read it with profit. (For Italian readers there is also an excellent Italian edition, <a href=\"https:\/\/www.amazon.it\/dp\/8806246623?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>L&#8217;arte della statistica<\/em><\/a>.)<\/p>\n\n\n<h3 class=\"wp-block-heading\">Finalmente ho capito la statistica \u2014 Maurizio De Pra<\/h3>\n\n\n<p class=\"wp-block-paragraph\">The title says it all (&#8220;statistics, I finally got it&#8221;). <a href=\"https:\/\/www.amazon.it\/dp\/8867319396?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Finalmente ho capito la statistica<\/em><\/a> (Italian edition) is the book for absolute beginners who want a gradual path, plenty of examples and a modest price. It covers the territory of <strong>probability distributions<\/strong> well \u2014 the ones this blog&#8217;s path takes from the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-poisson-distribution\/\">Poisson<\/a> to the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-beta-distribution-explained-simply\/\">Beta<\/a> \u2014 together with the foundations of <a href=\"https:\/\/www.gironi.it\/blog\/en\/first-steps-into-the-world-of-probability-sample-space-events-permutations-and-combinations\/\">probabilistic reasoning<\/a>. It does not replace a textbook, but it does what a textbook cannot: it takes the fear away.<\/p>\n\n\n<h2 class=\"wp-block-heading\">When Data Lies<\/h2>\n\n\n<h3 class=\"wp-block-heading\">How to Lie with Statistics \u2014 Darrell Huff<\/h3>\n\n\n<p class=\"wp-block-paragraph\">Written in 1954 and never aged. <a href=\"https:\/\/www.amazon.it\/dp\/0140213007?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>How to Lie with Statistics<\/em><\/a> is the short, venomous catalogue of the tricks numbers can be made to play: biased samples, conveniently chosen averages, truncated chart axes, percentages stripped of their context. Huff wrote for newspaper readers; I recommend it to anyone reading SEO tool reports and vendor slide decks, where those very tricks are alive and well. If you have been through <a href=\"https:\/\/www.gironi.it\/blog\/en\/simpsons-paradox-in-seo-when-aggregate-data-can-lie\/\">Simpson&#8217;s Paradox<\/a> you already know that aggregate data can lie: Huff completes the picture with all the other ways.<\/p>\n\n\n<p class=\"wp-block-paragraph\">You can read it in an afternoon, and from that afternoon on you never look at a chart the same way again. (Italian readers can find it as <a href=\"https:\/\/www.amazon.it\/dp\/8889479094?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Mentire con le statistiche<\/em><\/a>.)<\/p>\n\n\n<h2 class=\"wp-block-heading\">The Textbook for Getting Serious: Inference<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Statistica \u2014 Newbold, Carlson, Thorne<\/h3>\n\n\n<p class=\"wp-block-paragraph\">Sooner or later the moment comes when popular science is not enough: you want the applicability conditions of a test, the complete formulas, the exercises to check you understood. <a href=\"https:\/\/www.amazon.it\/dp\/8891910651?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Statistica<\/em><\/a> by Newbold, Carlson and Thorne (Italian edition) is the reference university textbook for the whole of inference: <a href=\"https:\/\/www.gironi.it\/blog\/en\/hypothesis-testing-a-step-by-step-guide\/\">hypothesis testing<\/a>, confidence intervals, chi-square, ANOVA \u2014 in practice, the theoretical backbone of my <a href=\"https:\/\/www.gironi.it\/blog\/en\/guide-to-statistical-tests-for-a-b-analysis\/\">guide to statistical tests for A\/B analysis<\/a>.<\/p>\n\n\n<p class=\"wp-block-paragraph\">Let me be frank: it is a university textbook, and it costs like one. But it is one of those books you buy once and consult for years.<\/p>\n\n\n<h2 class=\"wp-block-heading\">Regression, Time Series, Models<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Introduzione all&#8217;econometria \u2014 Stock, Watson<\/h3>\n\n\n<p class=\"wp-block-paragraph\">The name may be intimidating (econometrics?), but the content is exactly what anyone needs to go beyond basic <a href=\"https:\/\/www.gironi.it\/blog\/en\/correlation-and-regression-analysis-linear-regression\/\">linear regression<\/a>: multiple regression, omitted variables, diagnostics, <a href=\"https:\/\/www.gironi.it\/blog\/en\/time-series-analysis-and-forecasting-in-r\/\">time series<\/a>. <a href=\"https:\/\/www.amazon.it\/dp\/8891906190?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Introduzione all&#8217;econometria<\/em><\/a> by Stock and Watson (Italian edition; the English original is <em>Introduction to Econometrics<\/em>) has a quality that is rare in textbooks: a constant focus on the <strong>interpretation<\/strong> of results, not just their computation. Which is, after all, where the difference between a useful analysis and an exercise in style is decided.<\/p>\n\n\n<h2 class=\"wp-block-heading\">The (Fallible) Art of Prediction<\/h2>\n\n\n<h3 class=\"wp-block-heading\">The Signal and the Noise \u2014 Nate Silver<\/h3>\n\n\n<p class=\"wp-block-paragraph\">Anyone working with data sooner or later has to make a forecast \u2014 and an estimate of next quarter&#8217;s organic traffic is a forecast in every respect. <a href=\"https:\/\/www.amazon.it\/dp\/0141975652?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>The Signal and the Noise<\/em><\/a> tells the story of why predictions fail so often: too much faith in models, the temptation to mistake noise for signal, the inability to reason in probabilities. Silver \u2014 the man who called the 2012 US presidential election right in all fifty states \u2014 moves through poker, earthquakes, weather and finance, and along the way delivers the best narrative introduction to <a href=\"https:\/\/www.gironi.it\/blog\/en\/bayesian-statistics-how-to-learn-from-data-one-step-at-a-time\/\">Bayesian reasoning<\/a> I know of. It is the popular companion to the <a href=\"https:\/\/www.gironi.it\/blog\/en\/time-series-analysis-and-forecasting-in-r\/\">time series<\/a> chapter: first you learn to build a forecast, then you learn to distrust it. (There is also an Italian edition: <a href=\"https:\/\/www.amazon.it\/dp\/8860443865?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Il segnale e il rumore<\/em><\/a>.)<\/p>\n\n\n<h2 class=\"wp-block-heading\">Online Experimentation<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Trustworthy Online Controlled Experiments \u2014 Kohavi, Tang, Xu<\/h3>\n\n\n<p class=\"wp-block-paragraph\">On A\/B testing there is simply no equivalent: <a href=\"https:\/\/www.amazon.it\/dp\/1108724264?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Trustworthy Online Controlled Experiments<\/em><\/a> is <strong>the<\/strong> book on the subject, written by the people who led experimentation at Microsoft, Google and LinkedIn. Inside is everything I have touched in these articles \u2014 sample size, test power, mistakes to avoid \u2014 plus ten years of real-world cases about what goes wrong in actual experiments. I also used it to build my <a href=\"https:\/\/www.gironi.it\/blog\/en\/ab-test-sample-size-calculator\/\">sample size calculator<\/a>. Very readable.<\/p>\n\n\n<h2 class=\"wp-block-heading\">The Bayesian Path<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Bayesian Statistics the Fun Way \u2014 Will Kurt<\/h3>\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.gironi.it\/blog\/en\/bayesian-statistics-how-to-learn-from-data-one-step-at-a-time\/\">Bayesian statistics<\/a> has a reputation for being hard, and its textbooks do their best to confirm it. <a href=\"https:\/\/www.amazon.it\/dp\/1593279566?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Bayesian Statistics the Fun Way<\/em><\/a> does the opposite: Will Kurt explains priors, posteriors and Bayesian updating with examples taken from Star Wars and Lego bricks, and \u2014 something I particularly appreciate \u2014 uses R for the computational side, exactly as I do here. It is the right book for grasping the Bayesian logic (and the reason behind the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-beta-distribution-explained-simply\/\">Beta distribution<\/a>) before tackling the formal theory.<\/p>\n\n\n<h2 class=\"wp-block-heading\">Towards Machine Learning<\/h2>\n\n\n<h3 class=\"wp-block-heading\">An Introduction to Statistical Learning \u2014 James, Witten, Hastie, Tibshirani<\/h3>\n\n\n<p class=\"wp-block-paragraph\">The contemporary classic of statistical learning, known to everyone as &#8220;ISL&#8221;. <a href=\"https:\/\/www.amazon.it\/dp\/1461471370?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>An Introduction to Statistical Learning<\/em><\/a> covers, with the right balance of intuition and formalism, the topics of the more advanced part of this path: logistic regression, <a href=\"https:\/\/www.gironi.it\/blog\/en\/how-to-use-decision-trees-to-classify-data\/\">decision trees<\/a>, PCA, with hands-on labs in R. N.b.: the authors distribute the PDF for free from their website \u2014 the printed edition remains for those who, like me, prefer to annotate study books in pencil.<\/p>\n\n\n<h3 class=\"wp-block-heading\">Introduction to Machine Learning \u2014 Ethem Alpayd\u0131n<\/h3>\n\n\n<p class=\"wp-block-paragraph\">For those who want the theoretical foundations of machine learning \u2014 the ones that in a university course would come before the labs \u2014 <a href=\"https:\/\/www.amazon.it\/dp\/0262028182?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Introduction to Machine Learning<\/em><\/a> by Alpayd\u0131n is the reference I cited in my <a href=\"https:\/\/www.gironi.it\/blog\/en\/understanding-the-basics-of-machine-learning-a-beginners-guide\/\">introductory guide to ML<\/a>. More formal than ISL: one to pick up after it, not instead of it.<\/p>\n\n\n<h2 class=\"wp-block-heading\">The Working Language: R<\/h2>\n\n\n<h3 class=\"wp-block-heading\">R for Data Science \u2014 Wickham, \u00c7etinkaya-Rundel, Grolemund<\/h3>\n\n\n<p class=\"wp-block-paragraph\">There was an obvious gap on this shelf: R code shows up in nearly every article of this blog \u2014 from the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-chi-square-test-goodness-of-fit-and-test-of-independence\/\">chi-square test<\/a> to <a href=\"https:\/\/www.gironi.it\/blog\/en\/time-series-analysis-and-forecasting-in-r\/\">time series<\/a> \u2014 but the book to learn the language from was missing. <a href=\"https:\/\/www.amazon.it\/dp\/1492097403?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>R for Data Science<\/em><\/a> (second edition) fills the gap: Hadley Wickham is the author of the tidyverse, the package ecosystem that made R modern, and the book teaches the whole workflow \u2014 import, tidy, transform, visualise, communicate \u2014 on real data, with no superfluous theory. Like ISL, it can be read for free on the authors&#8217; website: one more reason to have no excuses.<\/p>\n\n\n<h2 class=\"wp-block-heading\">Communicating Data<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Storytelling with Data \u2014 Cole Nussbaumer Knaflic<\/h3>\n\n\n<p class=\"wp-block-paragraph\">The most rigorous analysis in the world is worth little if the person receiving it does not understand it \u2014 and in marketing an analysis almost always has to be told to someone: a client, a manager, a meeting. <a href=\"https:\/\/www.amazon.it\/dp\/1119002257?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Storytelling with Data<\/em><\/a> teaches how to turn the default charts of Excel and Looker Studio into clear messages: choosing the right chart, removing the ink that carries no information, directing attention where it matters, building a narrative around the number. Of the whole shelf it is probably the book that pays for itself fastest: you can apply it to your very next report. (There is also an Italian edition, <a href=\"https:\/\/www.amazon.it\/dp\/8850333846?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Data storytelling<\/em><\/a>, published by Apogeo.)<\/p>\n\n\n<h2 class=\"wp-block-heading\">A Niche Read<\/h2>\n\n\n<h3 class=\"wp-block-heading\">Monte Carlo Methods in Financial Engineering \u2014 Paul Glasserman<\/h3>\n\n\n<p class=\"wp-block-paragraph\">This is the most specialised book on the shelf, and I list it out of honesty towards anyone who has reached the <a href=\"https:\/\/www.gironi.it\/blog\/en\/the-monte-carlo-method-explained-simply-with-real-world-applications\/\">Monte Carlo method<\/a> and wants to go all the way: <a href=\"https:\/\/www.amazon.it\/dp\/1441915753?tag=consulenzeinf-21\" rel=\"nofollow sponsored noopener\" target=\"_blank\"><em>Monte Carlo Methods in Financial Engineering<\/em><\/a> by Glasserman is the complete reference on simulation applied to finance. Not a beach read: it is the text you reach for when the others are no longer enough.<\/p>\n\n\n<h2 class=\"wp-block-heading\">The Library at a Glance<\/h2>\n\n\n<p class=\"wp-block-paragraph\">To get your bearings quickly, here is the complete shelf in table form:<\/p>\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Book<\/th><th>Who it&#8217;s for<\/th><th>Language<\/th><\/tr><\/thead><tbody><tr><td><em>The Art of Statistics<\/em> \u2014 Spiegelhalter<\/td><td>Everyone: the starting point<\/td><td>EN (also IT)<\/td><\/tr><tr><td><em>Finalmente ho capito la statistica<\/em> \u2014 De Pra<\/td><td>Absolute beginners, distributions<\/td><td>IT<\/td><\/tr><tr><td><em>How to Lie with Statistics<\/em> \u2014 Huff<\/td><td>Defending yourself from doctored numbers<\/td><td>EN (also IT)<\/td><\/tr><tr><td><em>Statistica<\/em> \u2014 Newbold, Carlson, Thorne<\/td><td>For rigour: inference and tests<\/td><td>IT<\/td><\/tr><tr><td><em>Introduzione all&#8217;econometria<\/em> \u2014 Stock, Watson<\/td><td>Regression and time series<\/td><td>IT (orig. EN)<\/td><\/tr><tr><td><em>The Signal and the Noise<\/em> \u2014 Silver<\/td><td>Why predictions fail<\/td><td>EN (also IT)<\/td><\/tr><tr><td><em>Trustworthy Online Controlled Experiments<\/em> \u2014 Kohavi et al.<\/td><td>A\/B testing and experimentation<\/td><td>EN<\/td><\/tr><tr><td><em>Bayesian Statistics the Fun Way<\/em> \u2014 Kurt<\/td><td>The Bayesian approach, with R<\/td><td>EN<\/td><\/tr><tr><td><em>An Introduction to Statistical Learning<\/em> \u2014 James et al.<\/td><td>Practical machine learning, with R<\/td><td>EN<\/td><\/tr><tr><td><em>Introduction to Machine Learning<\/em> \u2014 Alpayd\u0131n<\/td><td>Theoretical foundations of ML<\/td><td>EN<\/td><\/tr><tr><td><em>R for Data Science<\/em> \u2014 Wickham et al.<\/td><td>Learning R, from raw data to charts<\/td><td>EN<\/td><\/tr><tr><td><em>Storytelling with Data<\/em> \u2014 Knaflic<\/td><td>Communicating data and reports<\/td><td>EN (also IT)<\/td><\/tr><tr><td><em>Monte Carlo Methods in Financial Engineering<\/em> \u2014 Glasserman<\/td><td>Advanced simulation<\/td><td>EN<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<p class=\"wp-block-paragraph\">This shelf is not closed. As the blog&#8217;s path widens \u2014 the statistical paradoxes I have started to explore, the bootstrap, text analysis \u2014 the library will widen too, and this page will be updated accordingly. In the meantime, if one single recommendation had to suffice: start with Spiegelhalter, and let the articles on this blog be your gym.<\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>There is a question that comes back, reliably, every time I publish an article along this path: &#8220;so, which book should I read to study these things?&#8221;. Until now I have answered one piece at a time, in the &#8220;Further Reading&#8221; section that closes each article. Here I do the reverse: I gather the whole &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/statistics-seo-library\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;The Statistics and SEO Library: the Books I Recommend (and Why)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[41,161],"tags":[],"class_list":["post-3652","post","type-post","status-publish","format-standard","hentry","category-seo","category-statistics"],"lang":"en","translations":{"en":3652,"it":3638},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"Paolo Gironi","author_link":"https:\/\/www.gironi.it\/blog\/author\/autore-articoli\/"},"uagb_comment_info":0,"uagb_excerpt":"There is a question that comes back, reliably, every time I publish an article along this path: &#8220;so, which book should I read to study these things?&#8221;. Until now I have answered one piece at a time, in the &#8220;Further Reading&#8221; section that closes each article. Here I do the reverse: I gather the whole&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3652","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3652"}],"version-history":[{"count":3,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3652\/revisions"}],"predecessor-version":[{"id":3663,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3652\/revisions\/3663"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3652"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3652"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3652"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}