{"id":3324,"date":"2018-10-10T16:00:00","date_gmt":"2018-10-10T15:00:00","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3324"},"modified":"2024-12-04T16:05:05","modified_gmt":"2024-12-04T15:05:05","slug":"descriptive-statistics-measures-of-position-and-central-tendency","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/descriptive-statistics-measures-of-position-and-central-tendency\/","title":{"rendered":"Descriptive Statistics: Measures of Position and Central Tendency"},"content":{"rendered":"\n<p>Measures of position, also known as <strong>position indices<\/strong>, or <strong>measures of central tendency<\/strong>, are values that summarize the <strong>position<\/strong> of a statistical <strong>distribution<\/strong>, providing a single figure that encapsulates the most important aspects of the data. In this brief discussion, we will explore some of the most common and practical indices, such as the various types of means, the median, quartiles, and percentiles.<\/p>\n\n\n\t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-b694d2c6      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tTopics Covered\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#measures-of-central-tendency\" class=\"uagb-toc-link__trigger\">Measures of Central Tendency<\/a><li class=\"uagb-toc__list\"><a href=\"#arithmetic-mean\" class=\"uagb-toc-link__trigger\">Arithmetic Mean<\/a><li class=\"uagb-toc__list\"><a href=\"#the-mean-of-grouped-data\" class=\"uagb-toc-link__trigger\">The Mean of Grouped Data<\/a><li class=\"uagb-toc__list\"><a href=\"#the-weighted-mean\" class=\"uagb-toc-link__trigger\">The Weighted Mean<\/a><li class=\"uagb-toc__list\"><a href=\"#the-geometric-mean\" class=\"uagb-toc-link__trigger\">The Geometric Mean<\/a><li class=\"uagb-toc__list\"><a href=\"#the-harmonic-mean\" class=\"uagb-toc-link__trigger\">The Harmonic Mean<\/a><li class=\"uagb-toc__list\"><a href=\"#the-trimmed-mean\" class=\"uagb-toc-link__trigger\">The Trimmed Mean<\/a><li class=\"uagb-toc__list\"><a href=\"#the-median\" class=\"uagb-toc-link__trigger\">The Median<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#the-median-of-grouped-data\" class=\"uagb-toc-link__trigger\">The Median of Grouped Data<\/a><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#the-mode\" class=\"uagb-toc-link__trigger\">The Mode<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#mode-of-grouped-data\" class=\"uagb-toc-link__trigger\">Mode of Grouped Data<\/a><\/li><\/ul><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#relationship-between-mean-median-and-mode\" class=\"uagb-toc-link__trigger\">Relationship Between Mean, Median, and Mode<\/a><li class=\"uagb-toc__list\"><a href=\"#quartiles-deciles-and-percentiles\" class=\"uagb-toc-link__trigger\">Quartiles, Deciles, and Percentiles<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#quartiles-deciles-and-percentiles-for-grouped-data\" class=\"uagb-toc-link__trigger\">Quartiles, Deciles, and Percentiles for Grouped Data<\/a><\/li><\/ul><\/li><\/ul><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#an-overview-the-very-useful-5-numbers\" class=\"uagb-toc-link__trigger\">An Overview: The Very Useful 5 Numbers<\/a><li class=\"uagb-toc__list\"><a href=\"#lets-help-ourselves-with-a-clever-graph-the-boxplot\" class=\"uagb-toc-link__trigger\">Let\u2019s Help Ourselves with a Clever Graph: The Boxplot\u00a0<\/a><\/ul><\/ul><\/ul><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\n\n\n<!--more-->\n\n\n\n<h4 class=\"wp-block-heading\">Measures of Central Tendency<\/h4>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">A mean is a measure of the central tendency of a set of values.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"arithmeticmean\">Arithmetic Mean<\/h4>\n\n\n\n<p>The arithmetic mean, as we all learned in school, is the sum of the values in a data set divided by the number of values.<br><br>In statistics, a parameter of a <strong>population<\/strong> is represented by a <strong>Greek letter<\/strong>, while a descriptive measure of a <strong>sample<\/strong> is denoted with a <strong>Roman letter<\/strong>.<\/p>\n\n\n\nFor the mean, we use the symbol \\( \\mu \\) for the mean of a population of values, and the letter \\( \\overline X \\) (read as &#8220;x-bar&#8221;) for the mean of a sample of values.\n\n\n\n<p>Note: For this discussion, we will consistently refer to population parameters.<\/p>\n\n\n\n<p>Here is the formula to calculate the population mean:<\/p>\n\n\n\n\\( \n\\mu = \\frac {\\Sigma X}{N} \\\\ \\\\ \n\\)\n<p>\nIn R, the function is <strong>mean()<\/strong>.\nA simple example:\n<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">variable = c(-3,1,2,4,5,2,8);\nmean(variable);\n\n[1] 2.714286<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"mediaragg\">The Mean of Grouped Data<\/h4>\n\n\n\n<p>Let&#8217;s imagine we have <em>n<\/em> data points grouped into <em>k<\/em> classes. We&#8217;ll call x<sub>c<\/sub> the central value of each class and <em>f<\/em><sub>i<\/sub> the observed frequency of each class.<\/p>\n\n\n\n<p>The formula to calculate the mean in this case is:<\/p>\n\n\n\n\\(\n\\bar x = \\frac{\\sum^{k}_{i=1} x_c f_i}{N} \\ \\\n\\)\n\n\n\n<p>Let&#8217;s look at an example:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Class<\/td><td>x<sub>c<\/sub><\/td><td>f<sub>i<\/sub><\/td><td>x<sub>c<\/sub> * f<sub>i<\/sub><\/td><\/tr><tr><td>4&lt;x<strong>\u2264<\/strong>8<\/td><td>6<\/td><td>5<\/td><td>30<\/td><\/tr><tr><td>8&lt;x<strong>\u2264<\/strong>12<\/td><td>10<\/td><td>11<\/td><td>110<\/td><\/tr><tr><td>12&lt;x<strong>\u2264<\/strong>20<\/td><td>16<\/td><td>14<\/td><td>224<\/td><\/tr><tr><td>20&lt;x<strong>\u2264<\/strong>28<\/td><td>24<\/td><td>7<\/td><td>168<\/td><\/tr><tr><td>Total<\/td><td><\/td><td><strong>37<\/strong><\/td><td><strong>532<\/strong><\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">our example data<\/figcaption><\/figure>\n\n\n\n<p>Let&#8217;s use the formula and calculate the mean:<\/p>\n\n\n\n\\(\n\\bar x = \\frac{(65) + (1011) + (1614) + (247)}{37} \\ \\ = \\frac{532}{37} = 14.38\n\\)\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n\n\n<h4 class=\"wp-block-heading\" id=\"mediaponderata\">The Weighted Mean<\/h4>\n\n\n\n<p>The <strong>weighted mean<\/strong>, or <strong>weighted average<\/strong>, is an arithmetic mean where <strong>each value is weighted according to its importance<\/strong> within the group. <br>Each value in the group (X) is multiplied by the appropriate weight factor \\( \\omega \\) and the various products are summed and divided by the sum of the weights:<\/p>\n\n\n\n\\(\n\\mu_{w} = \\frac{\\Sigma(w X)}{\\Sigma w}\n\\)\n\n\n\n<p>Let&#8217;s look at an example. Imagine a sporting discipline where the jury&#8217;s score for each exercise is &#8220;weighted&#8221; with a certain difficulty coefficient. We would have these scores for the various exercises:<br> <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Jury Score<\/td><td>Coefficient<\/td><\/tr><tr><td>9.3<\/td><td>4<\/td><\/tr><tr><td>9.8<\/td><td>2.8<\/td><\/tr><tr><td>8.8<\/td><td>3.3<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Let&#8217;s multiply each score by its respective coefficient:<\/p>\n\n\n\n<p>(9.3 * 4) + (9.8 * 2.8) + (8.8 * 3.3) = 93.68<\/p>\n\n\n\n<p>The sum of the coefficients (i.e., the &#8220;weights&#8221;) is:<\/p>\n\n\n\n<p>4 + 2.8 + 3.3 = 10.1<\/p>\n\n\n\n<p>And calculate the weighted mean:<\/p>\n\n\n\n<p>93.68 \/ 10.1 = 9.275<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"mediageometrica\">The Geometric Mean<\/h4>\n\n\n\nGiven a set of positive numbers x<sub>1<\/sub>,x<sub>2<\/sub>,&#8230;,x<sub>n<\/sub> the geometric mean is the <i>nth root<\/i> of the product of the <i>n<\/i> numbers. In formula:\n<br><br>\n\n\n\n\\(\n\\\n\\newcommand{\\vc}[3]{\\overset{#2}{\\underset{#3}{#1}}}\nGeometric \\ Mean = \\sqrt[n]{\\vc{\\Pi}{n}{i=i} \\ x_{i}}\n\\)\n<br><br>\n<p>\nN.B. The capital Greek letter pi is the symbol for <strong>product<\/strong>. The formula therefore equals this:\n<\/p>\n\\(\n\\\\ \nGeometric \\ Mean = \\sqrt[n]{x_{1} x_{1} &#8230; x_{n}} \\\\ \\\\\n\\)\n\n\n\n<p>R doesn&#8217;t have a built-in function for calculating the geometric mean, but calculating this measure of position is very simple. Let&#8217;s look at an example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">data = c(2,9,12) # My values\nn = length(data) # the number of values\n \n# prod calculates the product of vector elements\n# Calculate the geometric mean\nprod(data)^(1\/n)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"harmonicmean\">The Harmonic Mean<\/h4>\n\n\n\n<p>The harmonic mean of a dataset x<sub>1<\/sub>, x<sub>2<\/sub>, &#8230;, x<sub>n<\/sub> is the <strong>reciprocal of the arithmetic mean of the reciprocals of the data<\/strong>. In formula:<\/p>\n\n\n\n\\(\nHarmonic \\ mean = \\frac{1}{\\frac{1}{n} \\vc{\\Sigma}{n}{i=i} \\frac{1}{x_{i}}} \\\\ \\\\\n\\)\n\n\n\n<p>Similarly, for the harmonic mean, there is no specific built-in function available. However, since the harmonic mean is the reciprocal of the mean of the reciprocals of the data, the calculation is straightforward:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> 1\/mean(1\/data)<\/pre>\n\n\n\n<p>Using the very simple example mentioned above:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">data = c(2,9,12) # My values\nn = length(data) # The number of values\n\n# Calculate the harmonic mean\n1\/mean(1\/data)\n<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"trimmedmean\">The Trimmed Mean<\/h4>\n\n\n\n<p>A trimmed mean (e.g., at 5%) is an arithmetic mean of all ordered values after excluding the lowest 5% and the highest 5% from the dataset. In the example chosen, this mean is obtained by calculating the arithmetic mean of the central 90% of the population in the sorted series of observations.<\/p>\n\n\n\n<p>In R, the trimmed mean can be calculated by specifying the <strong>trim=<em>proportionToExclude<\/em><\/strong> option in the <strong>mean()<\/strong> function, where <em>proportionToExclude<\/em> is the proportion (between 0 and 1) of the smallest and largest values to exclude before calculating the arithmetic mean.<br><br>A simple R example to clarify:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">variable = c(-3,1,2,4,5,2,5,0.8,2.4,6,8);\nmean(variable, trim=0.05);  ### 5% trimmed mean\n\n[1] 3.018182<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<iframe style=\"width:120px;height:240px;\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" frameborder=\"0\" src=\"https:\/\/rcm-eu.amazon-adsystem.com\/e\/cm?ref=tf_til&amp;t=consulenzeinf-21&amp;m=amazon&amp;o=29&amp;p=8&amp;l=as1&amp;IS1=1&amp;asins=8860081238&amp;linkId=9b331810f953cca3b30a3223038629ca&amp;bc1=000000&amp;lt1=_blank&amp;fc1=333333&amp;lc1=0066c0&amp;bg1=ffffff&amp;f=ifr\">\n    <\/iframe>&nbsp;&nbsp;&nbsp;\n<iframe style=\"width:120px;height:240px;\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" frameborder=\"0\" src=\"https:\/\/rcm-eu.amazon-adsystem.com\/e\/cm?ref=tf_til&amp;t=consulenzeinf-21&amp;m=amazon&amp;o=29&amp;p=8&amp;l=as1&amp;IS2=1&amp;asins=8834886720&amp;linkId=dd7c1000f06e769b94b89353d8aedd15&amp;bc1=FFFFFF&amp;lt1=_blank&amp;fc1=333333&amp;lc1=0066C0&amp;bg1=FFFFFF&amp;f=ifr\">\n    <\/iframe>&nbsp;&nbsp;&nbsp;\n<iframe style=\"width:120px;height:240px;\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" frameborder=\"0\" src=\"https:\/\/rcm-eu.amazon-adsystem.com\/e\/cm?ref=tf_til&amp;t=consulenzeinf-21&amp;m=amazon&amp;o=29&amp;p=8&amp;l=as1&amp;IS2=1&amp;asins=8821308154&amp;linkId=3e829fc46795cc8f0a0f34e0c0b396b9&amp;bc1=FFFFFF&amp;lt1=_blank&amp;fc1=333333&amp;lc1=0066C0&amp;bg1=FFFFFF&amp;f=ifr\">\n    <\/iframe>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"mediana\">The Median<\/h4>\n\n\n\n<p>The median of a group of elements is the value of the central element when all the elements in the group are arranged in ascending or descending order of value.<\/p>\n\n\n\n<p>In practice, <strong>the median of a dataset is the value that divides the series into two equal parts<\/strong>: as many values above the median as below it.<br><br>To find the median, the rule known as <em>even-odd rule<\/em> is used:<br><\/p>\n\n\n\n<div class=\"wp-block-uagb-info-box uagb-block-773258b2 uagb-infobox__content-wrap  uagb-infobox-icon-left-title uagb-infobox-left uagb-infobox-image-valign-middle uagb-infobox__outer-wrap\"><div class=\"uagb-ifb-content\"><div class=\"uagb-ifb-left-title-image\"><div class=\"uagb-ifb-icon-wrap\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 384 512\"><path d=\"M384 48V512l-192-112L0 512V48C0 21.5 21.5 0 48 0h288C362.5 0 384 21.5 384 48z\"><\/path><\/svg><\/div><div class=\"uagb-ifb-title-wrap\"><h3 class=\"uagb-ifb-title\">Even-Odd Rule<\/h3><\/div><\/div><p class=\"uagb-ifb-desc\"><em><strong>If a series has an odd number of values, the median is the middle value of the series; if the series has an even number of values, the median is the arithmetic mean of the two middle values.<\/strong><\/em><\/p><\/div><\/div>\n\n\n\n<p><br>In practice, the formula for finding the median can be summarized as follows:<\/p>\n\n\n\n\\(\nMed = X_{(n\/2)+(1\/2)}\n\\)\n\n\n\n<p><br>The median can be calculated in R using the <strong>median()<\/strong> function:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">var = c(0,1,2,3,6,7,11,14);\nmedian(var);\n\n[1] 4.5<\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">The Median of Grouped Data<\/h5>\n\n\n\n<p>For grouped data, you must:<br>1. Determine <strong>which class contains the median value<\/strong><br>and then<br>2. Use <strong>interpolation<\/strong> to determine the position of the median within that class.&nbsp;<br><br>The class that contains the median is the first class whose cumulative frequency equals or exceeds half the total number of observations. Once this class is identified, the specific value of the median is determined using the formula:<\/p>\n\n\n\n\\(\nMed = C_{I} + (\\frac{\\frac{N}{2}- fc_{p}}{f_{c}})i\n\\)\n<br><br>\nwhere:\n<br><br>\n<b>C<sub>I<\/sub><\/b> = lower boundary of the class containing the median\n<br>\n<b>N<\/b> = total number of observations in the frequency distribution\n<br>\n<b><i>f<\/i>c<sub>p<\/sub><\/b> = cumulative frequency of the class preceding the one containing the median\n<br>\n<b><i>f<\/i>c<\/b> = number of observations in the class containing the median\n<br>\n<b><i>i<\/i><\/b> = width of the class interval\n<br>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>As always, an example illustrates how the operational reality is simpler than it first appears. Let\u2019s consider a distribution divided into frequency classes. In our example, we are dealing with height classes:<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Height (cm)<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Frequencies<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Cumulative Frequencies<\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">150 &#8211; 160<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">160 &#8211; 170<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">12<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">170 &#8211; 180<\/td><td class=\"has-text-align-center\" data-align=\"center\">10<\/td><td class=\"has-text-align-center\" data-align=\"center\">22<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">180 &#8211; 190<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">30<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">190 &#8211; 200<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">34<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The class containing the median is the first class whose cumulative frequency equals or exceeds half the total number of observations, which, as seen in the table, is 34\/2 = 17.<\/p>\n\n\n\n<p>The median class, containing the value 17, is therefore the 170\u2013180 class.<\/p>\n\n\n\n<p>We can then easily extract all the values to insert into our interpolation formula:<\/p>\n\n\n\n<p><strong>C<sub>I<\/sub><\/strong>&nbsp;= lower boundary of the class containing the median = 170<br><strong>N<\/strong>&nbsp;= total number of observations in the frequency distribution = 34<br><strong><em>f<\/em>c<sub>p<\/sub><\/strong>&nbsp;= cumulative frequency of the class preceding the one containing the median = 12<br><strong><em>f<\/em>c<\/strong>&nbsp;= number of observations in the class containing the median = 10<br><strong><em>i<\/em><\/strong>&nbsp;= width of the class interval = (180 &#8211; 170) = 10<\/p>\n\n\n\n\\(\nMed = C_{I} + (\\frac{\\frac{N}{2}- fc_{p}}{f_{c}})i \\\\\n\\)\n<p>therefore<\/p>\n\\(\nMed = 170 + (\\frac{(34\/2) &#8211; 12}{10}) * 10 = 175 cm\n\\)\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>The median value of the median class (170\u2013180 cm) is 175 cm.<\/p>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-uagb-info-box uagb-block-92a4b348 uagb-infobox__content-wrap  uagb-infobox-icon-left-title uagb-infobox-left uagb-infobox-image-valign-middle uagb-infobox__outer-wrap\"><div class=\"uagb-ifb-content\"><div class=\"uagb-ifb-left-title-image\"><div class=\"uagb-ifb-icon-wrap\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 384 512\"><path d=\"M384 48V512l-192-112L0 512V48C0 21.5 21.5 0 48 0h288C362.5 0 384 21.5 384 48z\"><\/path><\/svg><\/div><div class=\"uagb-ifb-title-wrap\"><h3 class=\"uagb-ifb-title\">Tips for Better Clarity<\/h3><\/div><\/div><p class=\"uagb-ifb-desc\">For <strong>non-standardized intervals<\/strong>, ensure the interval width is adjusted proportionally to avoid misinterpretation of the results.<\/p><\/div><\/div>\n\n\n<p><br>For this reason, for example, <strong>in the case of a <a href=\"https:\/\/www.gironi.it\/blog\/statistica-descrittiva-misure-di-dispersione-o-variabilita\/\" data-type=\"post\" data-id=\"1003\">skewed distribution<\/a>, the median is a more reliable indicator than the mean<\/strong>, which will always be &#8220;pulled&#8221; toward the tail of the distribution. In a skewed distribution, the median will always fall between the mean and the mode.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"moda\">The Mode<\/h4>\n\n\n\n<p>The mode of a dataset is <strong>the value that appears most frequently<\/strong>.<\/p>\n\n\n\n<p>For example, consider the frequency of scores in a test:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>score<\/td><td class=\"has-text-align-left\" data-align=\"left\">frequency<\/td><\/tr><tr><td>5<\/td><td class=\"has-text-align-left\" data-align=\"left\">5<\/td><\/tr><tr><td>6  <em><span class=\"has-inline-color has-bright-red-color\">&lt;&lt; <strong>this is the mode<\/strong><\/span><\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">11     <\/td><\/tr><tr><td>7<\/td><td class=\"has-text-align-left\" data-align=\"left\"><span class=\"has-inline-color has-dark-gray-color\">8 <\/span><\/td><\/tr><tr><td>8<\/td><td class=\"has-text-align-left\" data-align=\"left\">7<\/td><\/tr><tr><td>9<\/td><td class=\"has-text-align-left\" data-align=\"left\">3<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>A distribution like this is called <em><strong>unimodal<\/strong><\/em>. <br>In the case of a small dataset where <strong>no measured value repeats<\/strong>,<strong> there is no mode<\/strong>. <br>When two non-adjacent values both have the same maximum frequency, the distribution is said to be <em><strong>bimodal<\/strong><\/em>. <br>Distributions with several modes are called <em><strong>multimodal<\/strong><\/em>.<\/p>\n\n\n\n<p>In R, we can find the mode very easily using the <strong>which.max()<\/strong> instruction:<br><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">var = c(3,6,7,7,9,11,12,12,12,14,15,16,17,22,29,31);\nfrequency=tabulate(var);\nwhich.max(frequency);\n\n[1] 12<\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">Mode of Grouped Data<\/h5>\n\n\n\n<p>For grouped data in a frequency distribution with equal class intervals, first identify the class containing the mode by finding the class with the highest number of observations. Then use the formula:<\/p>\n\n\n\n\\(\nMode = C_{I} + (\\frac{d_{1}}{d_{1}+d_{2}})i\n\\)\n<br><br>\nwhere:\n<br><br>\n<b>C<sub>I<\/sub><\/b> = lower boundary of the class containing the mode\n<br>\n<b>d<sub>1<\/sub><\/b> = difference between the frequency of the modal class and the frequency of the previous class\n<br>\n<b>d<sub>2<\/sub><\/b> = difference between the frequency of the modal class and the frequency of the next class\n<br>\n<b><i>i<\/i><\/b> = class interval width\n\n\n\n<h4 class=\"wp-block-heading\">Relationship Between Mean, Median, and Mode<\/h4>\n\n\n\n<p>In the case of grouped data represented by a frequency curve, <strong>the difference between the values of the mean, median, and mode reveals the shape of the curve in terms of symmetry<\/strong>.<\/p>\n\n\n\n<p>For a symmetric unimodal distribution, the mean, median, and mode coincide, meaning they have the same value.<\/p>\n\n\n\n<p>In the case of a <em><strong>positively skewed distribution<\/strong><\/em>, the <strong>mean is the largest value, and the median is larger than the mode<\/strong>.<br>Thus:<br><br><em><strong>Positive skewness = tail on the right = Mean &gt; Median &gt; Mode<\/strong><\/em><br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"679\" height=\"432\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-positiva.png\" alt=\"\" class=\"wp-image-1041\" srcset=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-positiva.png 679w, https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-positiva-300x191.png 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/figure>\n\n\n\n<p>In the case of <em><strong>negative skewness<\/strong><\/em>, the <strong>mean has the smallest value, and the median is smaller than the mode<\/strong>.<br>Thus, in summary:<\/p>\n\n\n\n<p><em><strong>Negative skewness = tail on the left = Mean &lt; Median &lt; Mode<\/strong><\/em><\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-negativa.png\" alt=\"\" class=\"wp-image-1045\" width=\"647\" height=\"413\" srcset=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-negativa.png 863w, https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/asimmetria-negativa-300x192.png 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/figure>\n\n\n\n<p>A well-known measure of skewness, which uses the observed difference between the mean and the median of a group of values, is <strong>Pearson&#8217;s skewness index<\/strong>, which we will explore in more detail when introducing the concept of variability measures, as it includes the standard deviation in the denominator. For now, we can anticipate it as:<\/p>\n\n\n\n\\(\nSkewness = \\frac{3(\\mu &#8211; Med)}{\\sigma}\n\\)\n\n\n\n<h4 class=\"wp-block-heading\" id=\"quartiles\">Quartiles, Deciles, and Percentiles<\/h4>\n\n\n\n<p>Quartiles, deciles, and percentiles are similar to the median as they <strong>divide<\/strong> <strong>a distribution of measurements according to the proportion of observed frequencies<\/strong>.<\/p>\n\n\n\n<p>While the median divides the distribution into two halves, quartiles divide it into four quarters; deciles into ten tenths; percentiles into 100 hundredths. For ungrouped data, the formula for the median is modified depending on the desired fraction:<\/p>\n\n\n\n\\(\nQ_{1} \\ (first\\ quartile) = X_{(\\frac{n}{4} + \\frac{1}{2})} \\\\\nD_{3} \\ (third\\ decile) = X_{(\\frac{3n}{10} + \\frac{1}{2})} \\\\\nP_{60} \\ (sixtieth\\ percentile) = X_{(\\frac{60n}{100} + \\frac{1}{2})} \\\\\n\\)\n\n\n\n<p>In R, they can be calculated as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">var = c(3,6,7,7,9,11,12,12,14,15,16,17,22,29,31);\nquantile(var, probs=c(0.25,0.50,0.75)); ### quartiles\n\n 25%  50%  75% \n 8.0 12.0 16.5 \n\nquantile(var, probs=c(1:9)\/10); ### deciles\n\n 10%  20%  30%  40%  50%  60%  70%  80%  90% \n 6.4  7.0  9.4 11.6 12.0 14.4 15.8 18.0 26.2 <\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">Quartiles, Deciles, and Percentiles for Grouped Data<\/h5>\n\n\n\n<p>In this case, you must first determine the class containing the point corresponding to the desired fraction, referencing cumulative frequencies, and then interpolate. For example:<\/p>\n\n\n\n\\(\nQ_{1} \\ (first\\ quartile) = C_{1} + (\\frac{\\frac{n}{4} &#8211; fc_{p}}{f_{c}})i \\\\\nD_{2} \\ (second\\ decile) = C_{1} + (\\frac{\\frac{3n}{10} &#8211; fc_{p}}{f_{c}})i \\\\\nP_{60} \\ (sixtieth\\ percentile) = C_{1} + (\\frac{\\frac{60n}{100} &#8211; fc_{p}}{f_{c}})i \\\\\n\\)\n\n\n\n<h4 class=\"wp-block-heading\" id=\"5numbers\">An Overview: The Very Useful 5 Numbers<\/h4>\n\n\n\n<p>There is a summary description of data that allows us to immediately visualize key measures:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The minimum value of our data<\/li>\n\n\n\n<li>The value of the first quartile<\/li>\n\n\n\n<li>The median<\/li>\n\n\n\n<li>The value of the third quartile<\/li>\n\n\n\n<li>The maximum value<\/li>\n<\/ol>\n\n\n\n<p>The 5 numbers are often an excellent starting point for analyzing the characteristics of a distribution. In R, we have a specific command called (appropriately) <strong>fivenum()<\/strong>:<br><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">var = c(3,6,7,7,9,11,12,12,14,15,16,17,22,29,31);\nfivenum(var);\n\n[1]  3.0  8.0 12.0 16.5 31.0<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Let\u2019s Help Ourselves with a Clever Graph: The Boxplot&nbsp;<\/h4>\n\n\n\n<p>The <strong>boxplot<\/strong> (or <em>box diagram<\/em>) is a type of graph invented by the great <a href=\"https:\/\/en.wikipedia.org\/wiki\/John_Tukey\" target=\"_blank\" rel=\"noreferrer noopener\">John Tukey<\/a> to provide a clear overview of a dataset at a glance.&nbsp;<\/p>\n\n\n\n<p>The boxplot can be oriented horizontally or vertically and appears as a rectangle divided into two parts, with two lines extending from it. The rectangle (the &#8220;box&#8221;) is bounded by the first and third quartiles and is divided internally by the median. The &#8220;whiskers&#8221; represent the minimum and maximum values.&nbsp;<\/p>\n\n\n\n<p>Note: The presence of &#8220;whiskers&#8221; is the reason this diagram is often called a <em><strong>box-and-whisker plot<\/strong><\/em> or <em><strong>box-and-whisker diagram<\/strong>.<\/em><\/p>\n\n\n\n<p>This way, the four equally populated intervals defined by the quartiles are graphically represented.<br>Using a simple data distribution, here is how to call the boxplot function in R:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">dati <- c(24,17,21,23,15,30,24,21,24,19,25,28,22,20,14,19,26,29,23,25,24,18,27,21);\nboxplot(dati,ylab=\"\",col=gray(0.8));<\/pre>\n\n\n\n<p>Adding some labels here and there for better clarity, the result is this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"679\" height=\"432\" src=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/esempio-boxplot.png\" alt=\"\" class=\"wp-image-1082\" srcset=\"https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/esempio-boxplot.png 679w, https:\/\/www.gironi.it\/blog\/wp-content\/uploads\/2018\/10\/esempio-boxplot-300x191.png 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/figure>\n\n\n\n<p>An extremely effective \"snapshot\" of the observed value distribution.<\/p>\n\n\n\n<p><\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>Measures of position, also known as position indices, or measures of central tendency, are values that summarize the position of a statistical distribution, providing a single figure that encapsulates the most important aspects of the data. In this brief discussion, we will explore some of the most common and practical indices, such as the various &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/descriptive-statistics-measures-of-position-and-central-tendency\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;Descriptive Statistics: Measures of Position and Central Tendency&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161],"tags":[1243,1239,1241],"class_list":["post-3324","post","type-post","status-publish","format-standard","hentry","category-statistics","tag-central-tendency","tag-mean","tag-median"],"lang":"en","translations":{"en":3324,"it":1001},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"paolo","author_link":"https:\/\/www.gironi.it\/blog\/author\/paolo\/"},"uagb_comment_info":40,"uagb_excerpt":"Measures of position, also known as position indices, or measures of central tendency, are values that summarize the position of a statistical distribution, providing a single figure that encapsulates the most important aspects of the data. In this brief discussion, we will explore some of the most common and practical indices, such as the various&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3324","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3324"}],"version-history":[{"count":5,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3324\/revisions"}],"predecessor-version":[{"id":3330,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3324\/revisions\/3330"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3324"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3324"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3324"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}