{"id":3423,"date":"2026-02-24T12:02:41","date_gmt":"2026-02-24T11:02:41","guid":{"rendered":"https:\/\/www.gironi.it\/blog\/?p=3423"},"modified":"2026-02-26T21:53:35","modified_gmt":"2026-02-26T20:53:35","slug":"anomaly-detection-how-to-identify-outliers-in-your-data","status":"publish","type":"post","link":"https:\/\/www.gironi.it\/blog\/en\/anomaly-detection-how-to-identify-outliers-in-your-data\/","title":{"rendered":"Anomaly Detection: How to Identify Outliers in Your Data"},"content":{"rendered":"<p>Throughout this journey, we&#8217;ve examined tools to describe data, test hypotheses, and build models. But there&#8217;s a question that comes before all others\u2014one that&#8217;s too often ignored: <strong>are these data reliable?<\/strong><\/p>\n<p>In any dataset\u2014daily sessions, organic clicks, conversion rates\u2014values that don&#8217;t behave like the others can hide. Values that deviate abnormally from the rest of the distribution. In statistics, we call them <strong>outliers<\/strong>, or <strong>anomalous values<\/strong>.<\/p>\n<p>Let&#8217;s make one point clear immediately: an anomalous value isn&#8217;t necessarily an error. It can be a measurement error, certainly (a broken tracking tag, a bot inflating sessions). But it can also be the most important signal in the entire dataset: a Google algorithm update, content going viral, a technical issue crushing traffic. <strong>The issue isn&#8217;t eliminating anomalies\u2014it&#8217;s recognizing them<\/strong> and then deciding what to do about them.<\/p>\n<p>In this article, we&#8217;ll examine three statistical methods for identifying outliers, from the most intuitive to the most formal. For each, we&#8217;ll look at the logic, the limitations, and practical application with R.<\/p>\n<p><!--more--><\/p>\n<div style=\"border: 1px solid #ccc;padding: 1.2em 1.5em;margin: 1.5em 0;border-radius: 6px\">\n<h3 style=\"margin-top: 0\">What We&#8217;ll Cover<\/h3>\n<ul>\n<li><a href=\"#working-dataset\">The Working Dataset: Simulated Sessions with Injected Anomalies<\/a><\/li>\n<li><a href=\"#method-1-z-score\">Method 1: The Z-Score<\/a><\/li>\n<li><a href=\"#method-2-iqr-tukey\">Method 2: IQR and Tukey&#8217;s Method<\/a><\/li>\n<li><a href=\"#method-3-grubbs\">Method 3: Grubbs&#8217; Test<\/a><\/li>\n<li><a href=\"#comparing-three-methods\">Comparing the Three Methods<\/a><\/li>\n<li><a href=\"#try-it-yourself\">Try It Yourself<\/a><\/li>\n<li><a href=\"#further-reading\">Further Reading<\/a><\/li>\n<\/ul>\n<\/div>\n<hr \/>\n<h2 id=\"working-dataset\">The Working Dataset<\/h2>\n<p>To make things concrete, let&#8217;s build a simulated but realistic dataset: daily website sessions over the course of a year. The data roughly follows a normal distribution with mean 250 and standard deviation 50, but with five anomalies intentionally injected\u2014three sharp drops and two spikes.<\/p>\n<p>Let&#8217;s generate the data in R:<\/p>\n<pre><code class=\"language-r\">set.seed(42)\nn &lt;- 365\nsessions &lt;- round(rnorm(n, mean = 250, sd = 50))\nsessions[sessions &lt; 0] &lt;- 0\n\n# Inject 5 realistic anomalies\nsessions[45]  &lt;- 38   # day 45: technical issue\nsessions[120] &lt;- 580  # day 120: viral article\nsessions[200] &lt;- 22   # day 200: Google update\nsessions[300] &lt;- 510  # day 300: social media mention\nsessions[350] &lt;- 15   # day 350: server down<\/code><\/pre>\n<p>Let&#8217;s visualize the trend with a simple time plot:<\/p>\n<pre><code class=\"language-r\">plot(1:n, sessions, type = \"l\", col = \"steelblue\",\n     xlab = \"Day\", ylab = \"Sessions\",\n     main = \"Daily Sessions - One Year of Traffic\")\nabline(h = mean(sessions), col = \"red\", lty = 2)<\/code><\/pre>\n<p>By eye, we can spot some peaks and drops. But where do we draw the line between natural variation and anomaly? We need objective criteria.<\/p>\n<h2 id=\"method-1-z-score\">Method 1: The Z-Score<\/h2>\n<p>We encountered the z-score <a href=\"https:\/\/www.gironi.it\/blog\/la-distribuzione-normale\/\">when discussing the normal distribution<\/a>. The z-score tells us how many standard deviations a value is from the mean:<\/p>\n\\(<br \/>\nz = \\frac{x &#8211; \\mu}{\\sigma} \\\\<br \/>\n\\)\n<p>where \\(x\\) is the observed value, \\(\\mu\\) is the mean, and \\(\\sigma\\) is the standard deviation. A value with a z-score of 2 lies two standard deviations from the mean; one with a z-score of -3 lies three standard deviations below the mean.<\/p>\n<p>Recall the <strong>empirical rule<\/strong>: in a normal distribution, approximately 99.7% of data falls within three standard deviations of the mean. A value with |z| &gt; 3 is therefore extremely rare\u2014less than 0.3% probability under the normality assumption.<\/p>\n<p>Let&#8217;s calculate z-scores for our dataset and identify anomalies:<\/p>\n<pre><code class=\"language-r\">z &lt;- (sessions - mean(sessions)) \/ sd(sessions)\n\n# Conservative threshold: |z| &gt; 3\nanomalies_z3 &lt;- which(abs(z) &gt; 3)\ncat(\"Anomalous days (|z| &gt; 3):\", anomalies_z3, \"\\n\")\ncat(\"Sessions:\", sessions[anomalies_z3], \"\\n\")\ncat(\"Z-scores:\", round(z[anomalies_z3], 2), \"\\n\")<\/code><\/pre>\n<p>The result:<\/p>\n<pre><code>Anomalous days (|z| &gt; 3): 45 120 200 300 350\nSessions: 38 580 22 510 15\nZ-scores: -3.75 5.92 -4.03 4.67 -4.16<\/code><\/pre>\n<p>With the |z| &gt; 3 threshold, the z-score identifies exactly the five anomalies we injected. No false positives, no false negatives\u2014a nearly perfect result.<\/p>\n<p>But be careful: if we lower the threshold to |z| &gt; 2, anomalies jump to 14. Many of those values are simply data in the tail of the distribution, not real anomalies. <strong>The choice of threshold isn&#8217;t a technical detail\u2014it&#8217;s an analytical decision<\/strong> that depends on how willing we are to tolerate false alarms.<\/p>\n<p>There&#8217;s an important limitation to this method. The z-score assumes data follows (at least approximately) a <strong>normal distribution<\/strong>. If the distribution is heavily skewed\u2014and web traffic data often is, with long right tails\u2014the mean and standard deviation can be distorted by the very outliers we&#8217;re trying to find. It&#8217;s a vicious cycle: anomalies influence the statistics we use to detect them.<\/p>\n<h2 id=\"method-2-iqr-tukey\">Method 2: IQR and Tukey&#8217;s Method<\/h2>\n<p><a href=\"https:\/\/www.gironi.it\/blog\/statistica-descrittiva-misure-di-posizione\/\">Measures of position<\/a>\u2014quartiles and median\u2014offer an approach that doesn&#8217;t require assumptions about the distribution&#8217;s shape. Tukey&#8217;s method, named after the great statistician John Tukey, uses the <strong>interquartile range<\/strong> (IQR) as its measuring stick.<\/p>\n<p>The IQR, as we saw when discussing <a href=\"https:\/\/www.gironi.it\/blog\/misure-di-variabilita-o-dispersione\/\">measures of variability<\/a>, is the difference between the third quartile (\\(Q_3\\), the 75th percentile) and the first quartile (\\(Q_1\\), the 25th percentile). It represents the spread of the central 50% of data\u2014the &#8220;solid&#8221; part of the distribution, immune to the tails.<\/p>\n<p>Tukey&#8217;s rule is simple: a value is considered anomalous if it falls outside the so-called <strong>fences<\/strong>:<\/p>\n\\(<br \/>\n\\text{anomalous if } x &lt; Q_1 &#8211; 1.5 \\cdot IQR \\quad \\text{or} \\quad x &gt; Q_3 + 1.5 \\cdot IQR \\\\<br \/>\n\\)\n<p>Why 1.5? Tukey didn&#8217;t choose this value randomly. For a normal distribution, fences at 1.5 IQR correspond approximately to 2.7 standard deviations from the mean\u2014a reasonably conservative threshold that captures about 0.7% of observations in the tails. Strict enough to avoid too many false positives, sensitive enough not to miss important anomalies.<\/p>\n<p>Let&#8217;s apply the method to our dataset:<\/p>\n<pre><code class=\"language-r\">Q1 &lt;- quantile(sessions, 0.25)\nQ3 &lt;- quantile(sessions, 0.75)\nIQR_val &lt;- Q3 - Q1\n\nlower_limit &lt;- Q1 - 1.5 * IQR_val\nupper_limit &lt;- Q3 + 1.5 * IQR_val\n\ncat(\"Q1:\", Q1, \" Q3:\", Q3, \" IQR:\", IQR_val, \"\\n\")\ncat(\"Lower limit:\", lower_limit, \"\\n\")\ncat(\"Upper limit:\", upper_limit, \"\\n\")\n\nanomalies_iqr &lt;- which(sessions &lt; lower_limit | sessions &gt; upper_limit)\ncat(\"Anomalous days:\", anomalies_iqr, \"\\n\")\ncat(\"Sessions:\", sessions[anomalies_iqr], \"\\n\")<\/code><\/pre>\n<p>The result:<\/p>\n<pre><code>Q1: 215  Q3: 282  IQR: 67\nLower limit: 114.5\nUpper limit: 382.5\nAnomalous days: 45 59 118 120 200 300 350\nSessions: 38 100 385 580 22 510 15<\/code><\/pre>\n<p>Tukey&#8217;s method finds 7 anomalies: our 5 injected ones plus two borderline values (day 59 with 100 sessions and day 118 with 385). Are these truly anomalous? 100 sessions is indeed low for a site with mean 250, and 385 is high relative to the quartiles. The decision, once again, is up to the analyst.<\/p>\n<p>R offers an elegant way to visualize anomalies with Tukey&#8217;s method\u2014the boxplot:<\/p>\n<pre><code class=\"language-r\">boxplot(sessions, main = \"Daily Sessions\",\n        ylab = \"Sessions\", col = \"lightblue\", outline = TRUE)\n# Points beyond the whiskers are anomalies according to Tukey<\/code><\/pre>\n<p>The <strong>major advantage<\/strong> of this method over the z-score is robustness: median and quartiles aren&#8217;t influenced by outliers. We don&#8217;t need to assume data is normal. Tukey&#8217;s method works even with skewed distributions\u2014and for those working with web data, this is no small feature.<\/p>\n<p>The <strong>limitation<\/strong>: the method doesn&#8217;t distinguish between &#8220;large&#8221; and &#8220;enormous&#8221; anomalies. A value just outside the fence and one completely off the charts receive the same treatment\u2014they&#8217;re both &#8220;anomalous,&#8221; period.<\/p>\n<h2 id=\"method-3-grubbs\">Method 3: Grubbs&#8217; Test<\/h2>\n<p>The first two methods rely on empirical rules: z-score thresholds, IQR thresholds. But if we want a formal approach\u2014with a proper <a href=\"https:\/\/www.gironi.it\/blog\/il-test-delle-ipotesi\/\">hypothesis test<\/a>\u2014we can turn to <strong>Grubbs&#8217; test<\/strong>.<\/p>\n<p>The idea is this: we take the most extreme value in the dataset (the one furthest from the mean) and ask whether it&#8217;s compatible with the rest of the data, or whether it&#8217;s &#8220;too&#8221; extreme to be due to chance.<\/p>\n<p>The hypotheses are:<\/p>\n<ul>\n<li>\\(H_0\\): there are no outliers in the dataset<\/li>\n<li>\\(H_1\\): the most extreme value is an outlier<\/li>\n<\/ul>\n<p>The test statistic is:<\/p>\n\\(<br \/>\nG = \\frac{\\max |x_i &#8211; \\bar{x}|}{s} \\\\<br \/>\n\\)\n<p>where \\(\\bar{x}\\) is the mean and \\(s\\) is the standard deviation. In other words, \\(G\\) is the maximum absolute z-score. The critical value is derived from the <a href=\"https:\/\/www.gironi.it\/blog\/la-distribuzione-t-e-il-test-delle-ipotesi\/\">t-Student distribution<\/a> with \\(n-2\\) degrees of freedom.<\/p>\n<p>Let&#8217;s apply the test in R using the <code>outliers<\/code> package:<\/p>\n<pre><code class=\"language-r\">library(outliers)\n\nresult &lt;- grubbs.test(sessions)\nprint(result)<\/code><\/pre>\n<p>The result:<\/p>\n<pre><code>Grubbs test for one outlier\ndata:  sessions\nG = 5.9228, U = 0.9037, p-value = 2.339e-07\nalternative hypothesis: highest value 580 is an outlier<\/code><\/pre>\n<p>The test identifies 580 (the day 120 spike, our &#8220;viral article&#8221;) as an outlier, with a virtually null p-value. The evidence is overwhelming: that value isn&#8217;t compatible with the rest of the distribution.<\/p>\n<p>But we must keep in mind a <strong>fundamental limitation<\/strong> of Grubbs&#8217; test: <strong>it tests only one outlier at a time<\/strong>\u2014the most extreme one. If we suspect multiple anomalies (as in our case), we need to apply the test iteratively: remove the identified outlier, recalculate, test again.<\/p>\n<p>Let&#8217;s do it:<\/p>\n<pre><code class=\"language-r\">data &lt;- sessions\noutliers_found &lt;- c()\n\nfor(i in 1:5) {\n  g &lt;- grubbs.test(data)\n  if(g$p.value &lt; 0.05) {\n    # Extract outlier value from result\n    outlier_val &lt;- as.numeric(gsub(\"[^0-9.]\", \"\",\n                     regmatches(g$alternative,\n                     regexpr(\"[0-9.]+\", g$alternative))))\n    outliers_found &lt;- c(outliers_found, outlier_val)\n    data &lt;- data[data != outlier_val]\n    cat(\"Iteration\", i, \"- Outlier:\", outlier_val,\n        \"- p-value:\", format(g$p.value, digits = 3), \"\\n\")\n  } else {\n    cat(\"Iteration\", i, \"- No outlier (p =\",\n        round(g$p.value, 3), \")\\n\")\n    break\n  }\n}<\/code><\/pre>\n<p>This iterative approach is effective but <strong>treacherous<\/strong>: every time we remove a value, we change the distribution. The mean and standard deviation shift, and what wasn&#8217;t anomalous before might become so. It&#8217;s a procedure to use with caution and awareness.<\/p>\n<h2 id=\"comparing-three-methods\">Comparing the Three Methods<\/h2>\n<p>We&#8217;ve applied three methods to the same dataset. Let&#8217;s see what each found:<\/p>\n<table>\n<thead>\n<tr>\n<th>Day<\/th>\n<th>Sessions<\/th>\n<th>Simulated Event<\/th>\n<th>Z-score (|z|&gt;3)<\/th>\n<th>IQR\/Tukey<\/th>\n<th>Grubbs<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>45<\/td>\n<td>38<\/td>\n<td>Technical issue<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes (iter.)<\/td>\n<\/tr>\n<tr>\n<td>120<\/td>\n<td>580<\/td>\n<td>Viral article<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes (1st iter.)<\/td>\n<\/tr>\n<tr>\n<td>200<\/td>\n<td>22<\/td>\n<td>Google update<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes (iter.)<\/td>\n<\/tr>\n<tr>\n<td>300<\/td>\n<td>510<\/td>\n<td>Social mention<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes (iter.)<\/td>\n<\/tr>\n<tr>\n<td>350<\/td>\n<td>15<\/td>\n<td>Server down<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes (iter.)<\/td>\n<\/tr>\n<tr>\n<td>59<\/td>\n<td>100<\/td>\n<td>(none)<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td>118<\/td>\n<td>385<\/td>\n<td>(none)<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>All three methods find the five injected anomalies. Tukey&#8217;s method is the most sensitive: it also flags two borderline values that the other methods let pass. The z-score with threshold 3 is precise but depends on the normality assumption. Grubbs is the most formal but requires the iterative approach for multiple anomalies.<\/p>\n<p>The important lesson: <strong>there&#8217;s no universally right method<\/strong>. There&#8217;s the right method for those data and that question. In daily practice, a sensible approach is to apply more than one method and focus on values that are flagged consistently.<\/p>\n<p>Let&#8217;s summarize the three methods in R:<\/p>\n<pre><code class=\"language-r\"># Create a summary for each day\nsummary_df &lt;- data.frame(\n  day = 1:n,\n  sessions = sessions,\n  z_score = round(z, 2),\n  anomaly_z = abs(z) &gt; 3,\n  anomaly_iqr = sessions &lt; lower_limit | sessions &gt; upper_limit\n)\n\n# Show only rows anomalous by at least one method\nanomalous &lt;- summary_df[summary_df$anomaly_z | summary_df$anomaly_iqr, ]\nprint(anomalous)<\/code><\/pre>\n<h2 id=\"try-it-yourself\">Try It Yourself<\/h2>\n<p>An e-commerce site has tracked CTR on its product pages for 30 days. Here&#8217;s the data:<\/p>\n<pre><code class=\"language-r\">ctr &lt;- c(3.2, 2.8, 3.1, 2.9, 3.0, 3.3, 2.7, 3.1, 2.8, 3.0,\n         0.4, 3.2, 2.9, 3.1, 2.8, 3.0, 2.9, 7.8, 3.1, 2.7,\n         3.0, 3.2, 2.8, 3.1, 2.9, 3.0, 2.8, 3.1, 3.0, 2.9)<\/code><\/pre>\n<p>Days 11 and 18 look suspicious. Apply all three methods: z-score with threshold |z| &gt; 3, Tukey&#8217;s method, and Grubbs&#8217; test. Do all three agree? Which of the two values is more clearly anomalous, and why?<\/p>\n<hr \/>\n<p>So far, we&#8217;ve treated each observation as independent from the others. We&#8217;ve asked: &#8220;is this value compatible with the overall distribution?&#8221; But web traffic data has a temporal structure: trends, seasonality, weekly cycles. A 30% drop in December might be perfectly normal for a B2B site, while the same drop in September would be alarming.<\/p>\n<p>Distinguishing a real anomaly from simple seasonality requires different tools\u2014time series decomposition into trend, seasonal component, and residual. That will be the topic of a future article.<\/p>\n<hr \/>\n<h3 id=\"further-reading\">Further Reading<\/h3>\n<p>For those who want to deepen their understanding of outliers and statistical reasoning about unexpected data, <em>The Art of Statistics<\/em> by David Spiegelhalter is a read that tackles the problem with clarity and numerous real-world examples..<\/p>\n<p>For a more formal treatment of outlier tests (Grubbs, Rosner, Dixon), the textbook <em>Statistica<\/em> by Newbold, Carlson, and Thorne offers comprehensive coverage with exercises..<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Throughout this journey, we&#8217;ve examined tools to describe data, test hypotheses, and build models. But there&#8217;s a question that comes before all others\u2014one that&#8217;s too often ignored: are these data reliable? In any dataset\u2014daily sessions, organic clicks, conversion rates\u2014values that don&#8217;t behave like the others can hide. Values that deviate abnormally from the rest of &hellip; <a href=\"https:\/\/www.gironi.it\/blog\/en\/anomaly-detection-how-to-identify-outliers-in-your-data\/\" class=\"more-link\">Leggi tutto<span class=\"screen-reader-text\"> &#8220;Anomaly Detection: How to Identify Outliers in Your Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[161],"tags":[],"class_list":["post-3423","post","type-post","status-publish","format-standard","hentry","category-statistics"],"lang":"en","translations":{"en":3423,"it":3414},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false},"uagb_author_info":{"display_name":"paolo","author_link":"https:\/\/www.gironi.it\/blog\/author\/paolo\/"},"uagb_comment_info":2,"uagb_excerpt":"Throughout this journey, we&#8217;ve examined tools to describe data, test hypotheses, and build models. But there&#8217;s a question that comes before all others\u2014one that&#8217;s too often ignored: are these data reliable? In any dataset\u2014daily sessions, organic clicks, conversion rates\u2014values that don&#8217;t behave like the others can hide. Values that deviate abnormally from the rest of&hellip;","_links":{"self":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/comments?post=3423"}],"version-history":[{"count":3,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3423\/revisions"}],"predecessor-version":[{"id":3456,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/posts\/3423\/revisions\/3456"}],"wp:attachment":[{"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/media?parent=3423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/categories?post=3423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gironi.it\/blog\/wp-json\/wp\/v2\/tags?post=3423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}