Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are also called two-way tables or cross-tabulations.
Searching for relationships between two categorical variables is a very common goal for researchers. Think, for example, of the classic question that marketers ask: who is more likely to buy certain product categories, young or old people, men or women…
What We’ll Cover
Two-Way Tables and Marginal Distributions
A two-way table is a table with rows and columns that helps organize data from categorical variables:
- Rows represent the possible categories for one qualitative variable, for example males and females.
- Columns represent the possible categories for a second qualitative variable, for example whether someone likes pizza or not…
A marginal distribution shows how many total responses there are for each category of the variable. The marginal distribution of a variable can be determined by looking at the “Total” column (or row).
Let’s look at an example.
Note: I couldn’t think of anything particularly clever, so I created a table (with fictitious data, of course) of rare silliness, imagining that the two categorical variables concern education level and favorite sci-fi series…
We build the table in R:
scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)
rownames(scifi_fans) <- c("degree", "diploma", "lower education")
colnames(scifi_fans) <- c("star trek", "star wars", "doctor who")
scifi_fans <- as.table(scifi_fans)
scifi_fans
and we get something like this:
star trek star wars doctor who degree 44 38 26 diploma 53 35 30 lower education 58 22 29

Remember? A marginal distribution shows how many total responses there are for each category of the variable (at the margins, precisely, where the Total column or row is...).
We can compute row totals in R with:
margin.table(scifi_fans, 1)
and column totals with:
margin.table(scifi_fans, 2)
We can also find the "grand total" with:
margin.table(scifi_fans)
Here is the table with totals:
star trek star wars doctor who TOTAL degree 44 38 26 108 diploma 53 35 30 118 lower ed. 58 22 29 109 TOTAL 155 95 85 335
So the marginal totals by education level are 108 for degree holders, 118 for diploma holders, 109 for lower education.
Likewise, the marginal totals by sci-fi series type are 155 for Star Trek, 95 for Star Wars, 85 for Doctor Who.
The grand total must be the same in both directions, in this case 335.
We could also have displayed a complete table with totals using just a few lines of R code:
scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)
row_names <- c("degree", "diploma", "lower education")
col_names <- c("star trek", "star wars", "doctor who")
dimnames(scifi_fans) <- list(row_names, col_names)
# Compute column totals using apply
col_totals <- apply(scifi_fans, 2, sum)
# Add row with column totals using rbind
scifi_fans2 <- rbind(scifi_fans, col_totals)
# Compute row totals
row_totals <- apply(scifi_fans2, 1, sum)
# Add column with row totals
cont_table <- cbind(scifi_fans2, row_totals)
# Print the table
cont_table
We can then ask ourselves (and answer): what percentage of degree holders has a soft spot for Doctor Who?
Elementary, Watson (oh wait, that was a different series...):
26/108 = 0.24 = 24% of degree holders prefer Doctor Who
And how many Star Wars fans hold a diploma?
35/95 = 0.37 = 37% of Star Wars fans are diploma holders
In R, we can directly obtain row proportions with the function:
prop.table(scifi_fans, 1)
and the result will be:
star trek star wars doctor who degree 0.4074074 0.3518519 0.2407407 diploma 0.4491525 0.2966102 0.2542373 lower ed. 0.5321101 0.2018349 0.2660550
(as we can see, the row totals add up to 1, or 100%)
or column proportions with:
prop.table(scifi_fans, 2)
and the result will be:
star trek star wars doctor who degree 0.2838710 0.4000000 0.3058824 diploma 0.3419355 0.3684211 0.3529412 lower ed. 0.3741935 0.2315789 0.3411765
(as we can see, the column totals add up to 1, or 100%)
As always, there is more than one way to get the result. We can also install the "gmodels" package and use the CrossTable function (we'll leave it to R's built-in help to show all the command options...):
install.packages("gmodels")
library(gmodels)
scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)
rownames(scifi_fans) <- c("degree", "diploma", "lower education")
colnames(scifi_fans) <- c("star trek", "star wars", "doctor who")
CrossTable(scifi_fans, prop.r = "false", prop.c = "false", prop.t = "false", prop.chisq = "false")
So what is all this good for? The answer is: for example, to compute conditional probability.
Conditional Probability
Before we see what it is and why it is an extremely useful concept in everyday life, we need a few preliminary definitions about probability.
An event is something that occurs with one or more possible outcomes.
An experiment is the process of measuring or making an observation.
Key definition: the probability of an event is the ratio of the number of favorable cases to the number of possible cases
\( P(A) = \frac {\text{number of favorable cases}}{\text{number of possible cases}}\\ \)Let us also recall that:
- The probability that two events both occur can never be greater than the probability that each event occurs separately.
- If two possible events, A and B, are independent, then the probability that both occur is the product of their individual probabilities.
- If an event can have a certain number of different and distinct possible outcomes (A, B, C, etc.), then the probability that A or B occurs equals the sum of the individual probabilities of A and B, and the sum of the probabilities of all possible outcomes (A, B, C, etc.) equals 1, i.e. 100%.
The conditional probability of an event A with respect to an event B is the probability that A occurs, given that B has occurred.
The formula is:
\( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \)If a probability is based on one variable it is a marginal probability; if on two or more variables it is called a joint probability.
- The probability of an event P(A) is: \( \frac {\text{marginal probability of A}}{\text{Total}}\\ \)
- The joint probability of two events is: \( \frac {P(A \text{ and } B)}{\text{Total}}\\ \)
- The conditional probability of outcome A given the occurrence of condition B is: \( \frac {P(A \text{ and } B)}{P(B)}\\ \)
In other words:
A joint probability is the probability that someone selected from the entire group has two particular characteristics at the same time. That is, both characteristics occur jointly. We find a joint probability by taking the value of the cell at the intersection of A and B and dividing by the grand total.
To find a conditional probability, we take the value of the cell at the intersection of A and B and divide it by the marginal total of B, i.e. the variable expressing the event that has occurred.
It's time for a second example. We take the data from:
Ellis GJ and Stone LH. 1979. Marijuana Use in College: An Evaluation of a Modeling Explanation. Youth and Society 10:323-334.
The study asks whether a college student is more likely to smoke marijuana if their parents had used drugs in the past. Here is the table:
parents parents Total
use no use
student uses 125 94 219
student does not use 85 141 226
Total 210 235 445
Let's apply our knowledge to answer these questions:
- If the parents used soft drugs in the past, what is the probability that their child does the same in college?
This is a case of conditional probability.
We recall \( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \), therefore
P(student uses given that parents used) = 125 / 210 = 0.59 = 59%
2. A student is selected at random and does not use marijuana. What is the probability that their parents used it?
Here again we face a question that asks for a conditional probability. Therefore:
P(parents used given that student does not use) = 85 / 226 = 0.376 = 37.6%
3. What is the probability of selecting a student who does not use marijuana and whose parents used it in the past?
In this case we need to find a joint probability, so:
\( \frac {P(A \text{ and } B)}{\text{Total}}\\ \), therefore \( \frac {85}{445} = 0.19\\ \).
The probability is approximately 19%.
Dependence and Independence
If the outcomes of A and B influence each other, we say that the two variables are in a relationship of dependence.
Conversely, we say the two variables are independent.
More rigorously: we can state that event B is independent of event A if:
P(B|A) = P(B)
or
P(A|B) = P(A)
If this is not the case, the events are dependent on each other.
Therefore:
- P(A and B) = P(A) P(B) if and only if A and B are independent events.
- P(A | B) = P(A) and P(B | A) = P(B) if and only if A and B are independent events.
Let's examine the independence of categorical variables...
Let's explain this better with an example.
Let A be the event that people enjoy cycling.
B expresses whether they enjoy roast lamb. (Makes perfect sense, right?)
We build our contingency table:
Likes cycling Doesn't like cycling Total Likes roast lamb 95 36 131 No roast lamb 15 19 34 --------------------------------------------------------------- Total 110 55 165
Let's remember what it means for two events to be independent. It means this:
P(A | B) = P(A)
But in our case we see that
P(A) = 66.7%
because 110/165 = 0.67
P(A | B) = 72.5%
because 95/131 = 0.725
We recall that \( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \), therefore \( \frac {95}{131} = 0.725\\ \).
From the result it is clear that \( P(A) \neq P(A|B) \) -- the two events are NOT independent (therefore they are dependent).
After all, everyone knows that there is a clear dependence between loving cycling and loving roast lamb!
You might also like
Further Reading
For a comprehensive treatment of contingency tables, conditional probability, and the full machinery of categorical data analysis, Statistica by Newbold, Carlson and Thorne provides a rigorous yet accessible framework for applying these concepts in real-world settings.