The Student’s t-Distribution and Hypothesis Testing

In a previous article we presented the concept of hypothesis testing—a statistical method widely used to determine the validity of a claim based on a sample of data.

In the examples we proposed, however, we knew the value of the population standard deviation, sigma. In practice, this is a rather rare case, which allowed us to use the normal distribution and compute the Z-score.

If instead we do not know the population sigma, or if we are working with small samples, we must turn to a different type of distribution, called the t distribution or Student’s distribution.

Put more simply and clearly:

Student’s t distribution is a probability distribution used to assess the statistical significance of results when dealing with small sample sizes and uncertainty about the variance.

Continue reading “The Student’s t-Distribution and Hypothesis Testing”

The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples

In a previous article we discussed hypothesis testing as it relates to a single measurement: the sample mean.

There are, however, numerous situations in which we need to carry out statistical analysis involving two samples. Think, for example, of the case where we want to study the difference between men and women with respect to the results of a given examination.

Continue reading “The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples”

Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)

In previous articles, we examined how hypothesis testing works and how the t-distribution allows us to work even when we don’t know the population standard deviation. In both cases, we focused on a specific question: “can I reject the null hypothesis, yes or no?”

But there’s another question, equally important, that we ask ourselves constantly in daily practice: what is the approximate value of the parameter I’m estimating? It’s not enough to know whether the mean differs from a certain value; we want to know where it lies, and with what margin of uncertainty.

This is where confidence intervals (often abbreviated as CI) come into play—one of the most useful and, at the same time, most misunderstood tools in all of inferential statistics.

Continue reading “Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)”

Anomaly Detection: How to Identify Outliers in Your Data

Throughout this journey, we’ve examined tools to describe data, test hypotheses, and build models. But there’s a question that comes before all others—one that’s too often ignored: are these data reliable?

In any dataset—daily sessions, organic clicks, conversion rates—values that don’t behave like the others can hide. Values that deviate abnormally from the rest of the distribution. In statistics, we call them outliers, or anomalous values.

Let’s make one point clear immediately: an anomalous value isn’t necessarily an error. It can be a measurement error, certainly (a broken tracking tag, a bot inflating sessions). But it can also be the most important signal in the entire dataset: a Google algorithm update, content going viral, a technical issue crushing traffic. The issue isn’t eliminating anomalies—it’s recognizing them and then deciding what to do about them.

In this article, we’ll examine three statistical methods for identifying outliers, from the most intuitive to the most formal. For each, we’ll look at the logic, the limitations, and practical application with R.

Continue reading “Anomaly Detection: How to Identify Outliers in Your Data”

Bayesian Statistics: How to Learn from Data, One Step at a Time

In previous articles, we’ve examined statistical inference from a precise and coherent perspective: formulate a hypothesis, collect data, calculate a p-value, construct a confidence interval. We’ve conducted hypothesis tests, compared variants with A/B testing, and seen with the Central Limit Theorem why all of this works even when data isn’t normal.

This approach—called frequentist—has a clear logic: the parameter we want to estimate is a fixed value (even if unknown), and we “chase” it with data. But there’s another way to think about uncertainty, one that allows us to update our beliefs as new data arrives. It’s called the Bayesian approach, and in this article we’ll build its foundations.

Let’s start with a concrete example. Imagine we’ve just launched an advertising campaign and we don’t know the true click rate. We have an initial opinion based on experience (“click rates usually fall between 0% and 20%”), and then data starts coming in. The Bayesian approach lets us combine our initial opinion with the observed data to get an updated estimate—and repeat this process every time new information arrives.

Continue reading “Bayesian Statistics: How to Learn from Data, One Step at a Time”