Hypothesis Testing: A Step-by-Step Guide

In everyday life, we often have to make decisions based on incomplete information.

We may need to decide, for instance, whether a certain educational procedure is more effective than another, whether a new drug has genuinely positive effects on the course of a disease, and so on.

Hypothesis testing is a statistical procedure that allows us to pose a question on the basis of sample information, in order to reach a statistically significant decision.

In clearer and more direct terms: is my experimental finding due to chance? Hypothesis testing is precisely a statistical procedure for verifying whether chance is a plausible explanation of an experimental result.

Continue reading “Hypothesis Testing: A Step-by-Step Guide”

The t Distribution and Hypothesis Testing

In a previous article we presented the concept of hypothesis testing—a statistical method widely used to determine the validity of a claim based on a sample of data.

In the examples we proposed, however, we knew the value of the population standard deviation, sigma. In practice, this is a rather rare case, which allowed us to use the normal distribution and compute the Z-score.

If instead we do not know the population sigma, or if we are working with small samples, we must turn to a different type of distribution, called the t distribution or Student’s distribution.

Put more simply and clearly:

Student’s t distribution is a probability distribution used to assess the statistical significance of results when dealing with small sample sizes and uncertainty about the variance.

Continue reading “The t Distribution and Hypothesis Testing”

The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples

In a previous article we discussed hypothesis testing as it relates to a single measurement: the sample mean.

There are, however, numerous situations in which we need to carry out statistical analysis involving two samples. Think, for example, of the case where we want to study the difference between men and women with respect to the results of a given examination.

Continue reading “The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples”

Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)

In previous articles, we examined how hypothesis testing works and how the t-distribution allows us to work even when we don’t know the population standard deviation. In both cases, we focused on a specific question: “can I reject the null hypothesis, yes or no?”

But there’s another question, equally important, that we ask ourselves constantly in daily practice: what is the approximate value of the parameter I’m estimating? It’s not enough to know whether the mean differs from a certain value; we want to know where it lies, and with what margin of uncertainty.

This is where confidence intervals (often abbreviated as CI) come into play—one of the most useful and, at the same time, most misunderstood tools in all of inferential statistics.

Continue reading “Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)”

Anomaly Detection: How to Identify Outliers in Your Data

Throughout this journey, we’ve examined tools to describe data, test hypotheses, and build models. But there’s a question that comes before all others—one that’s too often ignored: are these data reliable?

In any dataset—daily sessions, organic clicks, conversion rates—values that don’t behave like the others can hide. Values that deviate abnormally from the rest of the distribution. In statistics, we call them outliers, or anomalous values.

Let’s make one point clear immediately: an anomalous value isn’t necessarily an error. It can be a measurement error, certainly (a broken tracking tag, a bot inflating sessions). But it can also be the most important signal in the entire dataset: a Google algorithm update, content going viral, a technical issue crushing traffic. The issue isn’t eliminating anomalies—it’s recognizing them and then deciding what to do about them.

In this article, we’ll examine three statistical methods for identifying outliers, from the most intuitive to the most formal. For each, we’ll look at the logic, the limitations, and practical application with R.

Continue reading “Anomaly Detection: How to Identify Outliers in Your Data”