{"id":3299,"date":"2021-10-17T14:41:00","date_gmt":"2021-10-17T13:41:00","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3299"},"modified":"2024-10-29T14:45:31","modified_gmt":"2024-10-29T13:45:31","slug":"multicollinearity-heteroscedasticity-autocorrelation-three-difficult-sounding-concepts-explained-simply","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/multicollinearity-heteroscedasticity-autocorrelation-three-difficult-sounding-concepts-explained-simply\/","title":{"rendered":"Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)"},"content":{"rendered":" <p>In various posts, particularly those on <a href=\"https:\/\/www.gironi.it\/blog\/lanalisi-di-regressione-multipla-spiegata-semplice\/\" target=\"_blank\" data-type=\"post\" data-id=\"2225\" rel=\"noreferrer noopener\">regression analysis<\/a>, <a href=\"https:\/\/www.gironi.it\/blog\/lanalisi-della-varianza-anova-spiegata-semplice\/\" target=\"_blank\" data-type=\"post\" data-id=\"2342\" rel=\"noreferrer noopener\">variance analysis<\/a>, and <a href=\"https:\/\/www.gironi.it\/blog\/analisi-delle-serie-storiche0-e-previsioni-di-serie-temporali-in-r-con-il-metodo-holt-winters\/\" target=\"_blank\" data-type=\"post\" data-id=\"1496\" rel=\"noreferrer noopener\">time series<\/a>, we\u2019ve come across terms that seem deliberately designed to scare the reader.<br>The aim of these articles is to explain these key concepts simply, beyond the apparent complexity (something I really wanted when I was a student, instead of facing texts written in a purposely convoluted and unnecessarily difficult way).<br>So, it\u2019s time to spend a few words on three very important concepts that often recur in statistical analysis and need to be well understood. The reality is much, much clearer than it seems, so&#8230; don\u2019t be afraid!<\/p>   <!--more-->  \t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-d89388c6      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tWhat We\u2019ll Cover\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#multicollinearity\" class=\"uagb-toc-link__trigger\">Multicollinearity<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#how-can-we-reduce-the-problem\" class=\"uagb-toc-link__trigger\">How can we reduce the problem?<\/a><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#heteroscedasticity\" class=\"uagb-toc-link__trigger\">Heteroscedasticity<\/a><li class=\"uagb-toc__list\"><a href=\"#autocorrelation\" class=\"uagb-toc-link__trigger\">Autocorrelation<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#how-can-i-check-for-autocorrelation\" class=\"uagb-toc-link__trigger\">How can I check for autocorrelation?<\/a><\/ul><\/ul><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t  <h2 class=\"wp-block-heading\">Multicollinearity<\/h2>   <p>If you have followed me across various posts, you may remember that we mentioned this term when approaching regression analysis.<\/p>   <p class=\"has-light-gray-background-color has-background\">We talk about <strong>multicollinearity<\/strong> when there is a <strong>strong correlation between two or more explanatory variables in our correlation model<\/strong>.<\/p>   <p>Multicollinearity is a rather tricky problem because it can undermine the validity of regression analysis, <strong>even if there\u2019s a high <a href=\"https:\/\/www.gironi.it\/blog\/regressione-lineare-semplice\/#il-coefficiente-di-determinazione-r2\" target=\"_blank\" rel=\"noreferrer noopener\">coefficient of determination<\/a> R<sup>2<\/sup><\/strong>, which might appear significant.<br>When multicollinearity exists, it\u2019s difficult to isolate the effect that dependent variables have on the independent variable, and the coefficients estimated with the least squares method may turn out to be statistically insignificant.<\/p>   <h5 class=\"wp-block-heading\"><strong>How can we reduce the problem?<\/strong><\/h5>   <p>We have several options:<\/p>   <ul class=\"wp-block-list\"> <li>Using a larger amount of data. In other words, increasing the sample size.<\/li>   <li>Transforming the functional relationship.<\/li>   <li>Using prior information.<\/li>   <li>Excluding one of the variables that show a strong collinear relationship.<\/li> <\/ul>   <h2 class=\"wp-block-heading\">Heteroscedasticity<\/h2>   <p>This term seems designed to scare. If you want to reinforce someone\u2019s belief (or bias) in the inherently, terrifying complexity of statistics, this is the magic word to use! \ud83d\ude42<br><br>Surprise: the concept isn\u2019t actually that complicated.<\/p>   <p class=\"has-light-gray-background-color has-background\"><strong>Heteroscedasticity<\/strong> simply means <strong>unequal dispersion<\/strong>.<br>It refers to situations where the <strong>variance of the error term isn\u2019t constant for all values of the independent variable<\/strong>.<\/p>   <p>In regression analysis, heteroscedasticity is problematic because <strong>ordinary least squares regression assumes that all residuals come from a population with a constant variance (<em>homoscedasticity<\/em>)<\/strong>.<br>Homoscedasticity is thus the opposite of heteroscedasticity&#8230;<\/p>   <p>Returning for a moment to the topic of regression, the assumption of homoscedasticity suggests that the prediction errors in Y are roughly the same at all levels of X, in both magnitude and scale.<\/p>   <h2 class=\"wp-block-heading\">Autocorrelation<\/h2>   <p>We discussed autocorrelation in the <a href=\"https:\/\/www.gironi.it\/blog\/analisi-delle-serie-storiche0-e-previsioni-di-serie-temporali-in-r-con-il-metodo-holt-winters\/\" target=\"_blank\" rel=\"noreferrer noopener\">long post on time series analysis<\/a>, where we also looked at a practical example.<\/p>   <p>To define the most common case, we can say that<\/p>   <p class=\"has-light-gray-background-color has-background\">positive first-order autocorrelation exists when the error term of one period is positively correlated with the error term of the immediately preceding period.<\/p>   <p>In time series, this is quite a common scenario and can result in bias errors, leading to incorrect statistical test results and confidence intervals.<\/p>   <p>Autocorrelation, also referred to in some texts as <strong>serial correlation<\/strong>, can also be of a higher order (it is of the second order if the error term of one period is correlated with the error term of two preceding periods, and so forth) and can also be negative.<\/p>   <h5 class=\"wp-block-heading\">How can I check for autocorrelation?<\/h5>   <p>In my post on time series analysis, we used R&#8217;s valuable acf() function and discussed the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Ljung%E2%80%93Box_test\" target=\"_blank\" rel=\"noreferrer noopener\">Ljung-Box test<\/a>.<br>A \u201cclassic\u201d method involves checking for autocorrelation using the <a href=\"https:\/\/it.wikipedia.org\/wiki\/Statistica_di_Durbin-Watson\" target=\"_blank\" rel=\"noreferrer noopener\">Durbin-Watson statistic<\/a>, calculating the <em>d<\/em> value and comparing it to the appropriate table values at the desired significance level, typically 5% or 1%.<\/p>   <p>In the presence of autocorrelation, the <strong>estimates<\/strong> obtained using ordinary least squares remain consistent and are not affected by systemic error, but the <strong>standard errors<\/strong> of the estimated regression parameters are unfortunately subject to systemic errors, potentially leading to inaccurate statistical tests and confidence intervals.<\/p>   <p>A method to correct for positive first-order autocorrelation (the most common type) is the Durbin two-stage method, which we won\u2019t cover here but will likely be the subject of a future post.<\/p> ","protected":false},"excerpt":{"rendered":"<p>In various posts, particularly those on regression analysis, variance analysis, and time series, we\u2019ve come across terms that seem deliberately designed to scare the reader.The aim of these articles is to explain these key concepts simply, beyond the apparent complexity (something I really wanted when I was a student, instead of facing texts written in &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/multicollinearity-heteroscedasticity-autocorrelation-three-difficult-sounding-concepts-explained-simply\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161],"tags":[1216,1218,291,1222,292,267,1212],"class_list":["post-3299","post","type-post","status-publish","format-standard","hentry","category-statistics","tag-autocorrelation","tag-error","tag-eteroschedasticita","tag-heteroscedasticity","tag-omoschedasticita","tag-regressione","tag-time-series"],"lang":"en","translations":{"en":3299,"it":2404},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"paolo","author_link":"https:\/\/www.gironi.it\/blog\/author\/paolo\/"},"uagb_comment_info":31,"uagb_excerpt":"In various posts, particularly those on regression analysis, variance analysis, and time series, we\u2019ve come across terms that seem deliberately designed to scare the reader.The aim of these articles is to explain these key concepts simply, beyond the apparent complexity (something I really wanted when I was a student, instead of facing texts written in&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3299"}],"version-history":[{"count":1,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3299\/revisions"}],"predecessor-version":[{"id":3300,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3299\/revisions\/3300"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}