In a previous article we discussed hypothesis testing as it relates to a single measurement: the sample mean.
There are, however, numerous situations in which we need to carry out statistical analysis involving two samples. Think, for example, of the case where we want to study the difference between men and women with respect to the results of a given examination.
What We’ll Cover
We can test a hypothesis concerning two independent samples (in which case the samples do not influence each other) or two dependent samples, where the samples are interrelated.
The purpose of the two-sample t-test is to determine whether the means of two populations differ significantly.
Hypothesis Testing for Independent Samples
When we test a hypothesis about two independent samples, we actually follow a process very similar to what we have already seen when testing a single random sample. However, when we compute the test statistic, we must calculate the estimated Standard Error of the difference between the sample means.
For the independent-samples test to be valid, certain conditions must be met:
- A random sample is used for each population;
- The random samples are each composed of independent observations;
- Each sample is independent of every other;
- The population distribution must be approximately normal, or the sample size must be sufficiently large.
Let us consider the hypotheses for our t-test:
H0: μ1 = μ2
Ha: μ1 ≠ μ2
Note that we have two population means, as we are testing whether the means of two separate populations are equal. In other words, we could also have written:
H0: μ1 – μ2 = 0
Ha: μ1 – μ2 ≠ 0
It is time to see what the formula looks like for determining the value of t:
\(t=\frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{SE_{(\bar{x}_1-\bar{x}_2)}} \\
\)
where:
\(\bar{x}_1-\bar{x}_2\) is the difference between the sample means;
\(\mu_1-\mu_2\) is the hypothesised difference between the population means;
\(SE_{(\bar{x}_1-\bar{x}_2)}\) is the standard error of the difference between the sample means.
The standard error of the difference between the sample means is calculated as:
\(SE_{(\bar{x}_1-\bar{x}_2)}=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}} \\
\)
I will spare both you and myself the formula for determining the degrees of freedom. It is long and looks rather “intimidating.” In practice, being lazy, I let the calculator or R compute its value. Alternatively—and this is the shortcut I prefer—I adopt a conservative approach and use the lower of the two group sizes minus one:
df = nsmaller – 1
In R, the test is straightforward. Suppose we have our data in two vectors, “women” and “men”:
# Two-tailed test
t.test(women, men)
# One-tailed tests
t.test(women, men, alternative = "less")
t.test(women, men, alternative = "greater")
Paired t-Test: Hypothesis Testing for Dependent Samples
The dependent-samples t-test differs in many respects from the one conducted on independent samples, to the point of being called, quite fittingly, a test for paired data. In practice, we often encounter a very common and very useful type of test: the pre-test / post-test design.
What are the conditions for conducting our test?
- The sample of differences is random;
- The paired observations are independent of one another;
- The distribution of population differences must be approximately normal, or the sample size of paired observations must be sufficiently large.
We start with our hypotheses:
H0: δ = 0
Ha: δ ≠ 0
The letter delta denotes “difference.” So our hypotheses are that the difference is equal to or different from 0.
We now compute t:
\(t=\frac{\bar{d}-\delta}{SE_{\bar{d}}} \\
\)
where \(\bar{d}\) is the mean of the differences between the paired variables, and \(SE_{\bar{d}}\) is the standard error of the difference.
The standard deviation of the differences is:
\(s_{d}=\sqrt{\frac{\Sigma(d-\bar{d})^2}{n-1}} \\
\)
And the standard error formula is:
\(SE_{\bar{d}}=\frac{s_{d}}{\sqrt{n}} \\
\)
A Worked Example
Suppose we want to test a hypothesis on the same subjects, before and after a certain event. If we need to conduct a pre- and post-test on the same subjects, we use a test on the differences. If the two sets of values are dependent variables, we use the R function:
t.test(before, after, paired = TRUE)
and we obtain the p-value. If it is less than our chosen significance level alpha, we choose the alternative hypothesis over the null hypothesis.
In practice, in R:
# Compute and visualise the differences
diff <- data$test - data$post_test
hist(diff)
We verify the normality of the differences. If it is acceptable, we proceed with the test:
# Two-tailed paired t-test
t.test(data$test, data$post_test, paired = TRUE)
The function returns the values of t, df, and p. If p < 0.05 (choosing a 95% significance level, i.e. alpha = 0.05), we reject the null hypothesis and accept the alternative.
For a one-tailed test:
t.test(data$test, data$post_test, paired = TRUE, alternative = "less")
# or
t.test(data$test, data$post_test, paired = TRUE, alternative = "greater")
The two-sample t-test, whether for independent or paired data, is one of the most widely used tools in applied statistics. Understanding when to use each variant—and checking the underlying assumptions—is what separates a reliable analysis from a misleading one. In the next article, we will look at contingency tables and conditional probability, which open the door to analysing categorical data.
You might also like
- The t Distribution and Hypothesis Testing
- Guide to Statistical Tests for A/B Analysis
- Confidence Intervals
Further Reading
For a detailed and systematic treatment of two-sample testing—including pooled and Welch variants, paired designs, and the assumptions that underpin them—Statistica by Newbold, Carlson and Thorne provides the full theoretical and practical framework needed to apply these tests correctly in real-world settings.