A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)

Over the previous articles we have looked at how hypothesis testing works and how the two-sample t-test lets us compare two groups rigorously. We have also built confidence intervals, learned to quantify the uncertainty of our estimates, and seen with the Central Limit Theorem why all this works even when the data are not normal.

But there is one question that, in the day-to-day reality of anyone doing SEO and marketing, comes up almost daily: which variant performs better? Which title tag brings more clicks? Which landing page converts more? Which meta description draws attention? It is not an academic question: it is the question that separates data-driven decisions from opinions disguised as strategies.

The good news is that we already have all the tools to answer it. A/B testing is nothing more than the direct application of the statistical inference concepts we have built step by step: hypothesis testing, comparison between groups, significance. In this article we put it all together.

Continue reading “A/B Testing: How to Run Statistically Valid Experiments (and the Mistakes to Avoid)”

An Introduction to Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used statistical technique for reducing the complexity of large datasets. It aims to cut down the number of variables, transforming potentially correlated ones into a smaller set of uncorrelated variables called principal components.

Continue reading “An Introduction to Principal Component Analysis (PCA)”

Correlation: Pearson, Spearman and Kendall (and Why It Isn’t Causation)

Anyone who looks at a website’s data does it constantly, often without noticing: they spot that two things seem to move together. Pages that sit higher in the SERP get more clicks; the ones where users linger longer convert more; longer articles appear to rank better. These are valuable hunches, but they stay vague until we answer a precise question: how much do these pairs of numbers move together? And in what sense? We need an index that turns the impression “they go hand in hand” into a comparable measure. That index is correlation, and it is one of the most used — and most misunderstood — tools in all of applied statistics.

Continue reading “Correlation: Pearson, Spearman and Kendall (and Why It Isn’t Causation)”

Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)

We closed the article on the A/B test significance calculator with a promise. We said that the p-value answers a single question — does the effect exist? — and that, on its own, it adds nothing else. It does not tell us how large the effect is, nor whether it is worth the effort of shipping it. It is time to keep that promise, because the two questions the p-value leaves hanging are exactly what separates reading data with method from stopping at the first threshold that glitters.

The two questions have precise names. The first — how big is it? — is the effect size. The second — with the data I have, could I even have seen an effect like this? — is the power of the test, and the reasoning that gets us to an answer is called power analysis. We examine them one at a time, as always with an example at hand.

Continue reading “Effect Size and Power Analysis: How Big Is the Effect (and How Much Data You Need)”

A/B Test Significance Calculator

Our A/B test has run its course: variant B shows a higher conversion rate than variant A. The temptation to declare a winner and ship the change is strong. But first there is a question to answer, the same one that runs through this whole series: is the difference we observe a real signal, or just statistical noise?

This calculator is the natural complement of the sample size calculator: that one works before the test and tells us how many users we need; this one works after and tells us whether the result we obtained is statistically significant. If you have read the article on hypothesis testing, you will recognise the machinery at once: behind the scenes sits a z-test for comparing two proportions.

Continue reading “A/B Test Significance Calculator”