  <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>goodness of fit &#8211; paologironi blog</title>
	<atom:link href="https://www.gironi.it/blog/en/tag/goodness-of-fit/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.gironi.it/blog</link>
	<description>Scattered notes on (retro) computing, data analysis, statistics, SEO, and things that change</description>
	<lastBuildDate>Thu, 18 Jun 2026 13:21:33 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>The Chi-Square Test: Goodness of Fit and Test of Independence</title>
		<link>https://www.gironi.it/blog/en/the-chi-square-test-goodness-of-fit-and-test-of-independence/</link>
					<comments>https://www.gironi.it/blog/en/the-chi-square-test-goodness-of-fit-and-test-of-independence/#respond</comments>
		
		<dc:creator><![CDATA[paolo]]></dc:creator>
		<pubDate>Tue, 10 Dec 2019 09:16:24 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[chi-square]]></category>
		<category><![CDATA[goodness of fit]]></category>
		<category><![CDATA[test of independence]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3344</guid>

					<description><![CDATA[In previous posts, we have seen different types of tests that we can use to analyze our data and test hypotheses. The chi-square test was proposed by Karl Pearson in 1900, and it is widely used to estimate how effectively the distribution of a categorical variable represents an expected distribution (in this case, we talk &#8230; <a href="https://www.gironi.it/blog/en/the-chi-square-test-goodness-of-fit-and-test-of-independence/" class="more-link">Continue reading<span class="screen-reader-text"> "The Chi-Square Test: Goodness of Fit and Test of Independence"</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">In previous posts, we have seen different types of tests that we can use to analyze our data and test hypotheses.</p>


<p class="wp-block-paragraph">The chi-square test was proposed by <a aria-label="Karl Pearson (opens in a new tab)" href="https://en.wikipedia.org/wiki/Karl_Pearson" target="_blank" rel="noreferrer noopener">Karl Pearson</a> in 1900, and it is widely used to estimate how effectively the distribution of a categorical variable represents an expected distribution (in this case, we talk about the &#8220;Goodness of Fit Test&#8221;) or to estimate when two categorical variables are independent of each other (and then we talk about the &#8220;Test of Independence&#8221;).</p>


<p class="wp-block-paragraph">Such is the importance and widespread use of this test that it was listed by the magazine <em>Scientific American</em> among the 20 most important scientific discoveries of the 20th century.</p>


<span id="more-3344"></span>


<div style="border:1px solid #ccc;padding:1.2em 1.5em;margin:1.5em 0;border-radius:6px">
<h3 style="margin-top:0">What we&#8217;ll cover</h3>
<ul>
<li><a href="#goodness-of-fit-test">The Goodness of Fit Test</a></li>
<li><a href="#a-simple-example">Understanding Through a Simple Example</a></li>
<li><a href="#casio-goodness-of-fit">Making Life Easier with a Casio Scientific Calculator</a></li>
<li><a href="#r-goodness-of-fit">Using R for the Goodness of Fit Test</a></li>
<li><a href="#test-of-independence">The Test of Independence</a></li>
<li><a href="#casio-independence">The Test of Independence with Casio</a></li>
<li><a href="#r-independence">The Test of Independence with R</a></li>
<li><a href="#seo-example-ctr">An SEO Example: Does CTR Depend on the Device?</a></li>
<li><a href="#try-it-yourself">Try It Yourself</a></li>
</ul>
</div>


<hr class="wp-block-separator has-css-opacity"/>


<h2 class="wp-block-heading" id="goodness-of-fit-test">The Goodness of Fit Test</h2>


<p class="wp-block-paragraph">This is a very useful test, concerning the distribution of a categorical variable. It allows us to verify if the observed frequencies differ significantly from the expected frequencies when there are more than two possible outcomes.</p>


<p class="wp-block-paragraph">The prerequisites for carrying out the test are very simple:</p>


<ol class="wp-block-list">
<li>The sample must be random;</li>


<li>Observations must be independent for the sample (one observation per subject);</li>


<li>No observed value in each class should be less than 5. <br>This last point sounds rather cryptic and deserves a few more words. When the variable is continuous or the characters are not nominal and individual sample observations are available, an important issue concerns determining the number of classes (also called &#8220;cells&#8221;) into which the distribution is divided. In practice, it is required that the theoretical frequencies are at least equal to 5; that is, it is necessary to verify that the number of elements observed in each class is not less than a minimum threshold.</li>
</ol>


<h2 class="wp-block-heading" id="a-simple-example">Understanding Through a Simple Example</h2>


<p class="wp-block-paragraph">As usual, to better understand what we are talking about, we will explain it with a super-simplified (and, I apologize, quite ridiculous&#8230;) example.<br><br>Suppose a study was conducted on electronics hobbyists who use Arduino boards. It was found that 50% own only one Arduino board, 30% have 2 to 4 boards, and 20% own 5 or more.</p>


<p class="wp-block-paragraph">Let&#8217;s imagine that I conducted my own independent study and found these data: out of 150 hobbyists, I found that 90 owned only one Arduino, 30 had 2 to 4 boards, and 30 had 5 or more boards.<br><br>The null hypothesis is that the proportions I found are in line with those of the official study.<br>The alternative hypothesis is obviously that the collected data do not confirm the proportions of the official study.</p>


<p class="wp-block-paragraph">I prepare my table by entering the data:</p>


<figure class="wp-block-table"><table><tbody><tr><td></td><td class="has-text-align-center" data-align="center"><b>One Arduino</b></td><td class="has-text-align-center" data-align="center"><b>2 to 4 boards</b></td><td class="has-text-align-center" data-align="center"><b>5 or more boards</b></td><td class="has-text-align-center" data-align="center"><b>Total</b></td></tr><tr><td>Observed Data</td><td class="has-text-align-center" data-align="center">90<br></td><td class="has-text-align-center" data-align="center">30</td><td class="has-text-align-center" data-align="center">30</td><td class="has-text-align-center" data-align="center">150</td></tr><tr><td>Expected Data</td><td class="has-text-align-center" data-align="center">0.50 x 150 = 75</td><td class="has-text-align-center" data-align="center">0.30 x 150 = 45</td><td class="has-text-align-center" data-align="center"> 0.20 x 150 = 30 </td><td class="has-text-align-center" data-align="center">150</td></tr></tbody></table></figure>


<p class="wp-block-paragraph">To accept the null hypothesis, the difference between the expected and observed frequencies must be attributable to sampling variability at the designated level of significance.</p>


<p class="wp-block-paragraph">The χ<sup>2</sup> statistic calculated from the sample data is given by:</p>



\(
\chi^2=\Sigma\frac{(f_0-f_e)^2}{f_e}\ \
\)
<p>
f<sub>0</sub>=observed frequencies <br>
f<sub>e</sub>=expected frequencies <br>
</p>


<p class="wp-block-paragraph">The degrees of freedom for the goodness of fit tests are:</p>



\(
df=(r-1)(c-1)\
\ \
\)
<p>
r = number of rows in the contingency table <br>
c = number of columns in the contingency table
</p>


<p class="wp-block-paragraph">Let&#8217;s use our example as guidelines. We start from the hypotheses:</p>



\(
H_0=the\ frequencies\ are\ 0.5\ 0.3\ 0.2\
H_a=the\ frequencies\ are\ not\ 0.5\ 0.3\ 0.2\
\)


<p>We have:</p>
\(
n=150\\
df=(2-1)(3-1)=2\\ \\
\)
<p>We find the critical χ<sup>2</sup> value in the tables (df=2, α=0.05)<br>
The value is: <b>5.99</b>
</p>


<p class="wp-block-paragraph">Now I calculate the χ<sup>2</sup> value for my data:</p>



\(
\chi^2=\frac{(90-75)^2}{75}+\frac{(30-45)^2}{45}+\frac{(30-30)^2}{30}=\
=\frac{225}{75}+\frac{225}{45}+\frac{0}{30}=\
=3+5\
=8\
\)


<p class="wp-block-paragraph">We conclude then (since the calculated value is <strong>higher than the critical value</strong>) that <strong>we can reject the null hypothesis at the 5% significance level</strong>. That is, we can reject the assertion that the frequencies are distributed according to the proportion 50%, 30%, 20%.</p>


<h2 class="wp-block-heading" id="casio-goodness-of-fit">Making Life Easier with a Casio Scientific Calculator</h2>


<p class="wp-block-paragraph">With my fx calculator, I just need to choose &#8220;STAT&#8221; from the menu and enter the observed values in list L1 and the expected values in L2 in my table editor.<br><br>Then I will choose:</p>


<pre class="wp-block-preformatted">[TEST]<br>[CHI]<br>[GoF]<br>Observed:List1<br>Expected:List2<br>df:2<br>[CALC]</pre>


<p class="wp-block-paragraph">and I will get both the chi-square value and the p-value (in this case, 0.01832, which is less than the alpha value of 0.05 I chose, confirming the conclusion that I can reject the null hypothesis and accept the alternative one).</p>


<h2 class="wp-block-heading" id="r-goodness-of-fit">Using R for the Goodness of Fit Test</h2>


<p class="wp-block-paragraph">In R, the example given is even easier to set up:</p>


<pre class="wp-block-preformatted">observed&lt;-c(90,30,30)
expected_proportion&lt;-c(0.5,0.3,0.2)
chisq.test(observed,p=expected_proportion,correct=FALSE)

and the result will be:

Chi-squared test for given probabilities
data: observed
X-squared = 8, df = 2, p-value = 0.01832</pre>


<h2 class="wp-block-heading" id="test-of-independence">The Test of Independence</h2>


<p class="wp-block-paragraph">It is commonly used to determine if two factors are related to each other.</p>


<p class="wp-block-paragraph">Generally, what we want to know is: &#8220;Is variable X independent of variable Y?&#8221;</p>


<p class="has-light-gray-background-color has-background wp-block-paragraph">Note: the answer we get from our test is <strong>only</strong> this, not <strong>how</strong> the variables are related.</p>


<p class="wp-block-paragraph">In the case of the goodness of fit test, there is <strong>only one variable</strong> at play: the observed frequencies can therefore be listed in a single row, or column, of values in a table.</p>


<p class="wp-block-paragraph">Tests of independence, on the other hand, involve <strong>two variables</strong>, and the <strong>object of the test</strong> is precisely the <strong>assumption that the two variables are statistically independent</strong>.</p>


<p class="wp-block-paragraph">Since two variables are involved in the test, the observed frequencies are entered into a <strong><a href="https://www.gironi.it/blog/en/contingency-tables-and-conditional-probability/" target="_blank" rel="noreferrer noopener" aria-label="contingency table (opens in a new tab)">contingency table</a></strong> of the <strong>row x column</strong> type. <br>For example, I represent the data relating to the age and gender of enthusiasts of a given commercial brand:</p>


<table class="font-size:11px;"><tbody><tr><td><b>Age</b></td><td><b>Male</b></td><td><b>Female</b></td><td><b>Total</b></td></tr><tr><td><b>&lt;35</b></td><td>66</td><td>54</td><td>120</td></tr><tr><td><b>&gt;=35</b></td><td>78</td><td>12</td><td>90</td></tr><tr><td><b>Total</b></td><td>144</td><td>66</td><td>210</td></tr></tbody></table>


<p class="wp-block-paragraph">We want to test the null hypothesis that the two <strong>qualitative</strong> variables, gender and age, are independent. Therefore, the alternative hypothesis predicts that there is a relationship between the two variables.</p>


<p class="wp-block-paragraph">If the hypothesis of independence is true, between the observed frequency of each cell and the total of the observed frequencies of the row and column in which that cell is included, there must be the same proportions existing between the column and row totals and the total sample size.</p>



\(
f_e=\frac{\Sigma_{row}\ \Sigma_{column}}{n}\
\ \
df=(r-1)(c-1)\
\ \
\)


<p class="wp-block-paragraph">At this point, I proceed with my example:</p>



\(
f_e=\frac{\Sigma_{row}\ \Sigma_{column}}{n}=\frac{120\times 144}{210}=82,3\
\)


<div style="height:14px" aria-hidden="true" class="wp-block-spacer"></div>


<p class="has-light-gray-background-color has-background wp-block-paragraph">The 3 remaining frequencies can be easily obtained by subtraction from the row and column totals. In fact, <strong>a 2&#215;2 table has df=1</strong>, meaning that <strong>the frequency of only one cell is free to vary</strong>.</p>


<p class="wp-block-paragraph">I will get:</p>


<table style="font-size:11px;"><tbody><tr><td><b>Age</b></td><td><b>Male</b></td><td><b>Female</b></td><td><b>Total</b></td></tr><tr><td>&lt;35</td><td>82</td><td>38</td><td>120</td></tr><tr><td>&gt;=35</td><td>62</td><td>28</td><td>90</td></tr><tr><td>Total</td><td>144</td><td>66</td><td>210</td></tr></tbody></table>
<br>


<div style="height:16px" aria-hidden="true" class="wp-block-spacer"></div>



\(
H_0=gender\ and\ age\ are\ independent\
H_a=there\ is\ a\ relationship\ between\ gender\ and\ age\
\ \
df=(2-1)(2-1)=1
\)
<br>
<br>
<p>
I choose a significance level of α=0.01
</p>



\(
\chi^2_{critical}=6.63\
\)


<p class="wp-block-paragraph">I calculate the chi-square value and find:</p>



\(
\chi^2=23.9\
\)


<p class="wp-block-paragraph">Therefore, the null hypothesis of independence is rejected at the 1% significance level. The variables age and gender are dependent.</p>


<h2 class="wp-block-heading" id="casio-independence">The Test of Independence with Casio</h2>


<p class="wp-block-paragraph">To solve my example very easily with my Casio, I could have done this:</p>


<p>I load my table data into a matrix, which I call A:</p>


<pre class="wp-block-preformatted">[[66,54][78,12]]→[OPTN][MAT][MAT][ALPHA][A]</pre>


<p class="wp-block-paragraph" id="block-bf13504e-9671-41f0-bff4-2df66f19200a">At this point, I move to the statistical functions:</p>


<pre class="wp-block-preformatted">[MENU][STAT]

[TEST][CHI][2WAY]

Observed:Mat A

Expected:Mat B

[CALC]</pre>



The result will be:
<br><br>
χ<sup>2</sup>=23.9299242<br>
p=9.9907e-07<br>
df=1<br>
<br>
As can be seen from the very low p-value, I accept the alternative hypothesis and reject the null hypothesis.


<h2 class="wp-block-heading" id="r-independence">The Test of Independence with R</h2>


<p class="wp-block-paragraph">I build my contingency table</p>


<pre class="wp-block-preformatted">enthusiasts &lt;- matrix(c(66,54,78,12),ncol=2,byrow=TRUE)
rownames(enthusiasts) &lt;- c("less than 35","35 or more")
colnames(enthusiasts) &lt;- c("male","female")
enthusiasts &lt;- as.table(enthusiasts)
enthusiasts

I can calculate the row totals:
margin.table(enthusiasts,1)

and the column totals:
margin.table(enthusiasts,2)

the grand total is:
margin.table(enthusiasts)

I look at the expected values:
chisq.test(enthusiasts)$expected

and test the hypothesis with:
chisq.test(enthusiasts)</pre>


<p class="wp-block-paragraph">The resulting very low p-value indicates that I can reject the null hypothesis of independence of the two variables.</p>




<h2 class="wp-block-heading" id="seo-example-ctr">An SEO Example: Does CTR Depend on the Device?</h2>



<p class="wp-block-paragraph">Arduino hobbyists and brand enthusiasts are fine for understanding the mechanics, but the test of independence is at its best in the daily practice of anyone working with Search Console data. Let&#8217;s pick up the numbers we already met when discussing <a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s Paradox</a>: in one month our site collected 10,000 impressions on Desktop with 550 clicks, and 20,000 impressions on Mobile with 500 clicks. The CTR is therefore 5.5% versus 2.5%: a difference that looks huge, but is it real, or could it be the product of chance?</p>



<p class="wp-block-paragraph">Phrased in the language of this article, the question becomes: <strong>is the click independent of the device?</strong> We build the contingency table, with one important caveat: the cells must contain <strong>counts</strong>, never percentages. For each device we therefore need the clicks and the &#8220;no clicks&#8221; (the impressions that did not generate a click).</p>



<figure class="wp-block-table"><table><thead><tr><th>Device</th><th>Clicks</th><th>No clicks</th><th>Total</th></tr></thead><tbody><tr><td><strong>Desktop</strong></td><td>550</td><td>9,450</td><td>10,000</td></tr><tr><td><strong>Mobile</strong></td><td>500</td><td>19,500</td><td>20,000</td></tr><tr><td><strong>Total</strong></td><td>1,050</td><td>28,950</td><td>30,000</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">The hypotheses are the usual ones:</p>



\(
H_0=click\ and\ device\ are\ independent\\
H_a=there\ is\ a\ relationship\ between\ click\ and\ device\\
\)



<p class="wp-block-paragraph">I check in R, building the matrix of counts (I use <code>correct=FALSE</code> so that the result can be compared with a manual calculation):</p>



<pre class="wp-block-preformatted">ctr &lt;- matrix(c(550, 9450, 500, 19500), ncol=2, byrow=TRUE)
rownames(ctr) &lt;- c("Desktop", "Mobile")
colnames(ctr) &lt;- c("click", "no click")
chisq.test(ctr, correct=FALSE)

the result will be:

Pearson's Chi-squared test
data:  ctr
X-squared = 177.65, df = 1, p-value &lt; 2.2e-16</pre>



<p class="wp-block-paragraph">The p-value is infinitesimal: we reject the null hypothesis without hesitation. The click <strong>depends</strong> on the device, and the difference between the two CTRs cannot be attributed to chance.</p>



<p class="has-light-gray-background-color has-background wp-block-paragraph">N.B.: with the volumes typical of Search Console (tens of thousands of impressions) the chi-square test rejects the null hypothesis even for tiny, practically irrelevant differences. Statistical significance tells us that the difference is not the product of chance, <strong>not</strong> that it matters: with very large samples the two things must be kept well apart.</p>



<h2 class="wp-block-heading" id="try-it-yourself">Try It Yourself</h2>



<p class="wp-block-paragraph">To consolidate the mechanics, here is an exercise with made-up but realistic data. From the Search Console of an e-commerce site we extract one month of data, separating brand queries from non-brand ones:</p>



<figure class="wp-block-table"><table><thead><tr><th>Query type</th><th>Clicks</th><th>No clicks</th><th>Total</th></tr></thead><tbody><tr><td><strong>Brand</strong></td><td>240</td><td>1,760</td><td>2,000</td></tr><tr><td><strong>Non-brand</strong></td><td>540</td><td>17,460</td><td>18,000</td></tr><tr><td><strong>Total</strong></td><td>780</td><td>19,220</td><td>20,000</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">The question is the same as before: does the click depend on the query type? The exercise consists of formulating the hypotheses, choosing α=0.05, building the matrix in R and running the test (again with <code>correct=FALSE</code>). If everything goes smoothly, the chi-square should come out close to 389, with a microscopic p-value. And while we are at it: which of the two CTRs (12% versus 3%) &#8220;pulls&#8221; the result more? A look at the expected frequencies with <code>chisq.test(...)$expected</code> helps answer that.</p>



<p class="wp-block-paragraph">One question remains open, though, and it is subtler than it seems: the test told us <em>that</em> the dependence exists, not <em>how strong</em> it is. As we have just seen, with large samples almost everything turns out significant: measuring the strength of an association requires other tools (such as Cramér&#8217;s V), and that will be the subject of an upcoming article dedicated to effect size and the power of tests.</p>


<!-- internal-links-section -->
<h3>You might also like</h3>
<ul>
<li><a href="https://www.gironi.it/blog/en/guide-to-statistical-tests-for-a-b-analysis/">Guide to Statistical Tests for A/B Analysis</a></li>
<li><a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie</a></li>
<li><a href="https://www.gironi.it/blog/en/sampling-and-sample-size-how-much-data-do-you-really-need/">Sampling and Sample Size: How Much Data Do You Really Need?</a></li>
</ul>


<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading">Further Reading</h3>



<p class="wp-block-paragraph">The chi-square test, with all its variants and applicability conditions, is covered in detail in <a href="https://www.amazon.it/dp/8891910651?tag=consulenzeinf-21&#038;ascsubtag=the-chi-square-test-goodness-of-fit-and-test-of-independence" rel="nofollow sponsored noopener" target="_blank"><em>Statistica</em></a> by Newbold, Carlson and Thorne (Italian edition), together with the other tests we have met along this path.</p>



<p class="wp-block-paragraph">And if the examples on these pages made you want to learn R properly, <a href="https://www.amazon.it/dp/1492097403?tag=consulenzeinf-21&#038;ascsubtag=the-chi-square-test-goodness-of-fit-and-test-of-independence" rel="nofollow sponsored noopener" target="_blank"><em>R for Data Science</em></a> by Hadley Wickham (second edition, also freely readable online) is the starting point I recommend.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/the-chi-square-test-goodness-of-fit-and-test-of-independence/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
