Contingency Tables and Conditional Probability

Contingency tables are used to evaluate the interaction between two categorical variables (qualitative). They are also called two-way tables or cross-tabulations.

Searching for relationships between two categorical variables is a very common goal for researchers. Think, for example, of the classic question that marketers ask: who is more likely to buy certain product categories, young or old people, men or women…

What We’ll Cover

Two-Way Tables and Marginal Distributions
Conditional Probability
Dependence and Independence
Further Reading

Two-Way Tables and Marginal Distributions

A two-way table is a table with rows and columns that helps organize data from categorical variables:

Rows represent the possible categories for one qualitative variable, for example males and females.
Columns represent the possible categories for a second qualitative variable, for example whether someone likes pizza or not…

A marginal distribution shows how many total responses there are for each category of the variable. The marginal distribution of a variable can be determined by looking at the “Total” column (or row).

Let’s look at an example.

Note: I couldn’t think of anything particularly clever, so I created a table (with fictitious data, of course) of rare silliness, imagining that the two categorical variables concern education level and favorite sci-fi series…

We build the table in R:

scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)
rownames(scifi_fans) <- c("degree", "diploma", "lower education")
colnames(scifi_fans) <- c("star trek", "star wars", "doctor who")
scifi_fans <- as.table(scifi_fans)
scifi_fans

and we get something like this:

                 star trek   star wars   doctor who
degree               44          38          26
diploma              53          35          30
lower education      58          22          29

Fantasy image for the sci-fi dataset used to discuss contingency tables and conditional probability

Remember? A marginal distribution shows how many total responses there are for each category of the variable (at the margins, precisely, where the Total column or row is...).

We can compute row totals in R with:

margin.table(scifi_fans, 1)

and column totals with:

margin.table(scifi_fans, 2)

We can also find the "grand total" with:

margin.table(scifi_fans)

Here is the table with totals:

              star trek   star wars   doctor who   TOTAL
degree            44          38          26        108
diploma           53          35          30        118
lower ed.         58          22          29        109
TOTAL            155          95          85        335

So the marginal totals by education level are 108 for degree holders, 118 for diploma holders, 109 for lower education.

Likewise, the marginal totals by sci-fi series type are 155 for Star Trek, 95 for Star Wars, 85 for Doctor Who.

The grand total must be the same in both directions, in this case 335.

We could also have displayed a complete table with totals using just a few lines of R code:

scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)

row_names <- c("degree", "diploma", "lower education")
col_names <- c("star trek", "star wars", "doctor who")
dimnames(scifi_fans) <- list(row_names, col_names)

# Compute column totals using apply
col_totals <- apply(scifi_fans, 2, sum)
# Add row with column totals using rbind
scifi_fans2 <- rbind(scifi_fans, col_totals)
# Compute row totals
row_totals <- apply(scifi_fans2, 1, sum)
# Add column with row totals
cont_table <- cbind(scifi_fans2, row_totals)

# Print the table
cont_table

We can then ask ourselves (and answer): what percentage of degree holders has a soft spot for Doctor Who?
Elementary, Watson (oh wait, that was a different series...):

26/108 = 0.24 = 24% of degree holders prefer Doctor Who

And how many Star Wars fans hold a diploma?

35/95 = 0.37 = 37% of Star Wars fans are diploma holders

In R, we can directly obtain row proportions with the function:

prop.table(scifi_fans, 1)

and the result will be:

                 star trek    star wars    doctor who
degree           0.4074074    0.3518519    0.2407407
diploma          0.4491525    0.2966102    0.2542373
lower ed.        0.5321101    0.2018349    0.2660550

(as we can see, the row totals add up to 1, or 100%)

or column proportions with:

prop.table(scifi_fans, 2)

and the result will be:

                 star trek    star wars    doctor who
degree           0.2838710    0.4000000    0.3058824
diploma          0.3419355    0.3684211    0.3529412
lower ed.        0.3741935    0.2315789    0.3411765

(as we can see, the column totals add up to 1, or 100%)

As always, there is more than one way to get the result. We can also install the "gmodels" package and use the CrossTable function (we'll leave it to R's built-in help to show all the command options...):

install.packages("gmodels")
library(gmodels)
scifi_fans <- matrix(c(44, 38, 26, 53, 35, 30, 58, 22, 29), ncol = 3, byrow = TRUE)
rownames(scifi_fans) <- c("degree", "diploma", "lower education")
colnames(scifi_fans) <- c("star trek", "star wars", "doctor who")

CrossTable(scifi_fans, prop.r = "false", prop.c = "false", prop.t = "false", prop.chisq = "false")

So what is all this good for? The answer is: for example, to compute conditional probability.

Conditional Probability

Before we see what it is and why it is an extremely useful concept in everyday life, we need a few preliminary definitions about probability.

An event is something that occurs with one or more possible outcomes.
An experiment is the process of measuring or making an observation.

Key definition: the probability of an event is the ratio of the number of favorable cases to the number of possible cases

\( P(A) = \frac {\text{number of favorable cases}}{\text{number of possible cases}}\\ \)

Let us also recall that:

The probability that two events both occur can never be greater than the probability that each event occurs separately.
If two possible events, A and B, are independent, then the probability that both occur is the product of their individual probabilities.
If an event can have a certain number of different and distinct possible outcomes (A, B, C, etc.), then the probability that A or B occurs equals the sum of the individual probabilities of A and B, and the sum of the probabilities of all possible outcomes (A, B, C, etc.) equals 1, i.e. 100%.

The conditional probability of an event A with respect to an event B is the probability that A occurs, given that B has occurred.

The formula is:

\( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \)

If a probability is based on one variable it is a marginal probability; if on two or more variables it is called a joint probability.

The probability of an event P(A) is: \( \frac {\text{marginal probability of A}}{\text{Total}}\\ \)
The joint probability of two events is: \( \frac {P(A \text{ and } B)}{\text{Total}}\\ \)
The conditional probability of outcome A given the occurrence of condition B is: \( \frac {P(A \text{ and } B)}{P(B)}\\ \)

In other words:

A joint probability is the probability that someone selected from the entire group has two particular characteristics at the same time. That is, both characteristics occur jointly. We find a joint probability by taking the value of the cell at the intersection of A and B and dividing by the grand total.

To find a conditional probability, we take the value of the cell at the intersection of A and B and divide it by the marginal total of B, i.e. the variable expressing the event that has occurred.

It's time for a second example. We take the data from:
Ellis GJ and Stone LH. 1979. Marijuana Use in College: An Evaluation of a Modeling Explanation. Youth and Society 10:323-334.

The study asks whether a college student is more likely to smoke marijuana if their parents had used drugs in the past. Here is the table:

                    parents    parents     Total
                      use      no use
student uses          125        94         219
student does not use   85       141         226
Total                 210       235         445

Let's apply our knowledge to answer these questions:

If the parents used soft drugs in the past, what is the probability that their child does the same in college?

This is a case of conditional probability.
We recall \( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \), therefore

P(student uses given that parents used) = 125 / 210 = 0.59 = 59%

2. A student is selected at random and does not use marijuana. What is the probability that their parents used it?

Here again we face a question that asks for a conditional probability. Therefore:

P(parents used given that student does not use) = 85 / 226 = 0.376 = 37.6%

3. What is the probability of selecting a student who does not use marijuana and whose parents used it in the past?

In this case we need to find a joint probability, so:

\( \frac {P(A \text{ and } B)}{\text{Total}}\\ \), therefore \( \frac {85}{445} = 0.19\\ \).

The probability is approximately 19%.

Dependence and Independence

If the outcomes of A and B influence each other, we say that the two variables are in a relationship of dependence.
Conversely, we say the two variables are independent.

More rigorously: we can state that event B is independent of event A if:

P(B|A) = P(B)

P(A|B) = P(A)

If this is not the case, the events are dependent on each other.

Therefore:

P(A and B) = P(A) P(B) if and only if A and B are independent events.
P(A | B) = P(A) and P(B | A) = P(B) if and only if A and B are independent events.

Let's examine the independence of categorical variables...

Let's explain this better with an example.

Let A be the event that people enjoy cycling.
B expresses whether they enjoy roast lamb. (Makes perfect sense, right?)

We build our contingency table:

                  Likes cycling   Doesn't like cycling   Total
Likes roast lamb       95                36               131
No roast lamb          15                19                34
---------------------------------------------------------------
Total                 110                55               165

Let's remember what it means for two events to be independent. It means this:
P(A | B) = P(A)

But in our case we see that
P(A) = 66.7%
because 110/165 = 0.67

P(A | B) = 72.5%
because 95/131 = 0.725

We recall that \( P(A|B) = \frac {P(A \text{ and } B)}{P(B)}\\ \), therefore \( \frac {95}{131} = 0.725\\ \).

From the result it is clear that \( P(A) \neq P(A|B) \) -- the two events are NOT independent (therefore they are dependent).

After all, everyone knows that there is a clear dependence between loving cycling and loving roast lamb!

Contingency Tables and Conditional Probability

What We’ll Cover

Two-Way Tables and Marginal Distributions

Conditional Probability

Dependence and Independence

Let's examine the independence of categorical variables...

You might also like

Further Reading

Leave a Reply