statistics

Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)

In previous articles, we examined how hypothesis testing works and how the t-distribution allows us to work even when we don’t know the population standard deviation. In both cases, we focused on a specific question: “can I reject the null hypothesis, yes or no?”

But there’s another question, equally important, that we ask ourselves constantly in daily practice: what is the approximate value of the parameter I’m estimating? It’s not enough to know whether the mean differs from a certain value; we want to know where it lies, and with what margin of uncertainty.

This is where confidence intervals (often abbreviated as CI) come into play—one of the most useful and, at the same time, most misunderstood tools in all of inferential statistics.

What We’ll Cover

What Is a Confidence Interval
The Great Misconception: What a CI Is NOT
Building a CI for the Mean
CI for Proportions
The Relationship Between CI and Hypothesis Testing
Confidence Levels: 90%, 95%, 99%
What Affects the Width of a CI
A Practical Example: Organic CTR Confidence Interval
Try It Yourself
Further Reading

What Is a Confidence Interval

Let’s start with a concrete example. Suppose we want to know the average duration of organic sessions on our website. We can’t observe all the sessions that will ever occur (that would be the “population”); but we can observe a sample—say, the sessions from the past month.

From the sample we calculate a mean: for instance, 2 minutes and 45 seconds. But we know full well that this is a point estimate: if we took another sample (the following month, say), we’d get a slightly different value. The point estimate, on its own, tells us nothing about its precision.

The confidence interval solves exactly this problem. It is a range of values, constructed from sample data, that with a certain level of confidence contains the true population parameter.

In clearer, more direct terms: instead of saying “the average duration is 2:45,” we say “we are reasonably confident that the population mean duration falls between 2:30 and 3:00.” We have traded the illusory precision of a single number for the honesty of an interval.

The Great Misconception: What a CI Is NOT

It’s essential to keep in mind a fundamental point, because this is where one of the most widespread errors in statistics lurks.

When we say “95% confidence interval,” we are not saying there is a 95% probability that the population parameter falls inside that interval. The population parameter is a fixed value (even though unknown): it doesn’t “fall” anywhere, it is not a random variable.

What the 95% actually means is this: if we repeated the sampling many times, and for each sample calculated a confidence interval, 95% of those intervals would contain the true parameter. It is a property of the procedure, not of any single interval.

Sounds difficult? Let’s try a quick analogy. Imagine casting a fishing net 100 times. If our net is good (built to 95% standards), about 95 times out of 100 it will catch the fish. But once we’ve cast the net and pulled it back in, the fish is either inside or it isn’t: it makes no sense to say “there’s a 95% probability the fish is in the net.” The confidence interval is the net; the parameter is the fish.

Building a CI for the Mean

Let’s see how to construct a confidence interval for a population mean. The formula is:

\(
\bar{x} \pm t_{\alpha/2, \, n-1} \cdot \frac{s}{\sqrt{n}} \\
\)

where:

\(\bar{x}\) is the sample mean
\(t_{\alpha/2, \, n-1}\) is the critical value of the t-distribution with \(n – 1\) degrees of freedom
\(s\) is the sample standard deviation
\(n\) is the sample size
\(\frac{s}{\sqrt{n}}\) is the standard error of the mean

The term \(t_{\alpha/2, \, n-1} \cdot \frac{s}{\sqrt{n}}\) is called the margin of error. It is the “arm’s length” of our interval: the larger it is, the more uncertain we are.

Numerical Example

Suppose we measured the average duration of organic sessions over a sample of 30 days. The data:

Sample mean: \(\bar{x} = 200\) seconds
Sample standard deviation: \(s = 12\) seconds
Sample size: \(n = 30\)
Desired confidence level: 95%

Let’s compute step by step.

Step 1: Find the critical value \(t\). For a 95% confidence level, we look up \(t_{0.025, \, 29}\) (i.e., the value that leaves 2.5% in each tail). With 29 degrees of freedom, \(t \approx 2.045\).

Step 2: Calculate the standard error:

\(
SE = \frac{s}{\sqrt{n}} = \frac{12}{\sqrt{30}} \approx 2.19 \\
\)

Step 3: Calculate the margin of error:

\(
ME = t \cdot SE = 2.045 \times 2.19 \approx 4.48 \\
\)

Step 4: Construct the interval:

\(
200 \pm 4.48 = [195.52, \; 204.48] \\
\)

Therefore: we are reasonably confident (at the 95% level) that the population mean session duration lies between approximately 195.5 and 204.5 seconds.

In R

Let’s compute the same interval in R:

n <- 30
xbar <- 200
s <- 12

margin <- qt(0.975, df = n - 1) * s / sqrt(n)

lower <- xbar - margin
upper <- xbar + margin

cat("IC al 95%:", round(lower, 2), "-", round(upper, 2), "\n")
cat("Margine di errore:", round(margin, 2), "\n")

Result: 95% CI: 195.52 – 204.48, with a margin of error of 4.48 seconds.

As you can see, it really is a piece of cake: three lines of code and we have our interval.

CI for Proportions

In the day-to-day reality of SEO and digital marketing, we often work not with means but with proportions: conversion rates, CTR, bounce rates. For proportions, the formula is slightly different.

The confidence interval for a proportion is:

\(
\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\
\)

where:

\(\hat{p}\) is the sample proportion (for example, the observed conversion rate)
\(z_{\alpha/2}\) is the critical value of the standard normal distribution (here we use \(z\) because with proportions and sufficiently large samples, the distribution is approximately normal)
\(n\) is the sample size

N.B.: this formula (known as the Wald interval) works well when \(n\) is sufficiently large and \(\hat{p}\) is not too close to 0 or 1. As a rule of thumb, we need at least \(n \cdot \hat{p} \geq 5\) and \(n \cdot (1 – \hat{p}) \geq 5\).

Example: CI for Conversion Rate

A landing page received 500 visits last month, with 18 conversions. The observed conversion rate is:

\(
\hat{p} = \frac{18}{500} = 0.036 \quad (3.6\%) \\
\)

Let’s calculate the 95% CI. The critical value \(z_{0.025} = 1.96\).

\(
SE = \sqrt{\frac{0.036 \times 0.964}{500}} = \sqrt{\frac{0.0347}{500}} \approx 0.0083 \\
\) \(
ME = 1.96 \times 0.0083 \approx 0.0163 \\
\) \(
CI = [0.036 – 0.0163, \; 0.036 + 0.0163] = [0.0197, \; 0.0523] \\
\)

In other words: the true conversion rate of the page lies, with 95% confidence, between 1.97% and 5.23%.

This is extremely useful in practice. If someone asks “what’s the conversion rate of that page?”, answering “3.6%” is a half-truth. Answering “between 2% and 5.2%” is honest and informative.

In R

Let’s build the same calculation in R:

n <- 500
successi <- 18
p_hat <- successi / n

se <- sqrt(p_hat * (1 - p_hat) / n)
z <- qnorm(0.975)
margin <- z * se

lower <- p_hat - margin
upper <- p_hat + margin

cat("Proporzione:", round(p_hat, 4), "\n")
cat("IC al 95%:", round(lower, 4), "-", round(upper, 4), "\n")

# oppure, con la funzione integrata:
prop.test(successi, n, conf.level = 0.95)

R’s prop.test() function directly returns the confidence interval (using a continuity correction that makes it slightly more conservative).

The Relationship Between CI and Hypothesis Testing

There is a deep connection between confidence intervals and hypothesis testing, and understanding it clarifies both concepts.

If a hypothesized value falls outside the 95% confidence interval, then the hypothesis test at that value would be rejected with \(\alpha = 0.05\). And vice versa: if the value falls inside the CI, we cannot reject the null hypothesis.

Let’s work through an example. Returning to our mean session duration: 95% CI = [195.52, 204.48]. If someone hypothesizes that the mean duration is 190 seconds, we can respond: “190 falls outside our 95% CI, so we would reject the null hypothesis \(H_0: \mu = 190\) with \(\alpha = 0.05\).” If instead the hypothesis were \(\mu = 198\), 198 falls inside the interval, and we could not reject.

In a sense, the confidence interval is more informative than the hypothesis test: the test tells us only “yes/no,” while the CI tells us where the parameter lies. It’s like the difference between asking “are you in Rome?” (test) and asking “where are you?” (CI).

Confidence Levels: 90%, 95%, 99%

The most commonly used confidence level is 95%, but it’s not the only one. Let’s see how the interval changes as we vary the level:

Level	Critical value z	CI for our example (mean)
90%	1.645	[196.40, 203.60]
95%	1.960	[195.52, 204.48]
99%	2.576	[193.36, 206.64]

The rule is straightforward: the higher the confidence, the wider the interval. That’s the price of certainty: if we want to be more sure the interval contains the parameter, we have to cast a wider net.

It’s a trade-off. A 99% CI is almost certainly correct, but it’s so wide as to be of little practical use (“the mean is between 193 and 207 seconds”—so what?). A 90% CI is narrower and operationally more useful, but it’s wrong more often.

In the everyday practice of SEO and marketing, 95% is the standard convention. There’s nothing magical about that number (just as there’s nothing magical about the famous \(\alpha = 0.05\) in hypothesis testing), but it is the compromise the scientific community has adopted, and there’s no point reinventing the wheel.

What Affects the Width of a CI

Three factors determine how wide (or narrow) our interval will be:

1. Sample size (\(n\)): the more data we have, the narrower the CI. This is intuitive: the more observations we collect, the more precise our estimate becomes. The relationship is with \(\sqrt{n}\), which means that to halve the width of the CI we must quadruple the sample.

2. Data variability (\(s\)): the more dispersed the data, the wider the CI. If site traffic varies wildly from day to day, our estimate of the mean will be less precise.

3. The confidence level: as we saw, higher confidence means a wider interval.

Of these three factors, the only one we have direct control over is the sample size. That’s why the question “how much data do I need?” is so important—and it’s a topic we’ll tackle in an upcoming article.

A Practical Example: Organic CTR Confidence Interval

Let’s apply all of this to a real-world case. Suppose we have a page that shows the following data in Search Console for the past month:

Impressions: 2000
Clicks: 140
Observed CTR: \(\frac{140}{2000} = 0.07\) (7%)

Let’s build the 95% CI for the CTR:

n <- 2000
click <- 140
ctr <- click / n

se <- sqrt(ctr * (1 - ctr) / n)
z <- qnorm(0.975)

lower <- ctr - z * se
upper <- ctr + z * se

cat("CTR osservato:", round(ctr * 100, 2), "%\n")
cat("IC al 95%:", round(lower * 100, 2), "% -", round(upper * 100, 2), "%\n")

Result: Observed CTR 7.00%, 95% CI: 5.88% – 8.12%.

This tells us something important: that 7% is a reasonably precise estimate (the margin is about one percentage point in each direction), thanks to the 2000 impressions. If we had only 200 impressions, the interval would have been much wider and the estimate far less reliable.

This is invaluable information when making comparisons. If another page has a CTR of 6.5% on a similar number of impressions, we can already intuit (and formally verify with a test) that the difference is not statistically significant: the two intervals overlap substantially.

Try It Yourself

An email marketing campaign produced these results over the last quarter:

Emails sent: 1200
Opens: 312
Clicks within the email: 78

1. Calculate the 95% confidence interval for the open rate.

2. Calculate the 95% confidence interval for the click rate (based on total emails sent).

3. A colleague claims that “our open rate is 30%.” Based on your CI, is this claim compatible with the data?

Hint: use the CI formula for proportions. In R, prop.test(successes, total) does all the work.

We’ve seen how the confidence interval transforms a point estimate (a single, illusorily precise number) into honest information about our uncertainty. But one question remains open: if the width of the CI depends on sample size, how much data do we need to obtain an interval narrow enough to be useful? That’s the problem of sample size determination, and it’s a topic that deserves its own article.