  <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>lisciatori &#8211; paologironi blog</title>
	<atom:link href="https://www.gironi.it/blog/en/tag/lisciatori/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.gironi.it/blog</link>
	<description>Scattered notes on (retro) computing, data analysis, statistics, SEO, and things that change</description>
	<lastBuildDate>Thu, 18 Jun 2026 13:21:27 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Time Series Analysis and Forecasting in R</title>
		<link>https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/</link>
					<comments>https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/#respond</comments>
		
		<dc:creator><![CDATA[paolo]]></dc:creator>
		<pubDate>Sat, 28 Dec 2019 13:59:00 +0000</pubDate>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[holt-winters]]></category>
		<category><![CDATA[lisciatori]]></category>
		<category><![CDATA[livellamento]]></category>
		<category><![CDATA[serie storica]]></category>
		<category><![CDATA[trend]]></category>
		<guid isPermaLink="false">https://www.gironi.it/blog/?p=3331</guid>

					<description><![CDATA[What is meant by a time series? A time series consists of values observed over a set of sequentially ordered periods. This, for those who do SEO, is already an element of utmost interest. Website traffic data, considered over a time sequence, is in fact an example of a time series. Time series analysis is &#8230; <a href="https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/" class="more-link">Continue reading<span class="screen-reader-text"> "Time Series Analysis and Forecasting in R"</span></a>]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">What is meant by a time series?</h2>



<p class="wp-block-paragraph">A <strong>time series</strong> consists of values observed over a set of sequentially ordered periods. <strong>This, for those who do SEO, is already an element of utmost interest</strong>.</p>



<p class="has-light-gray-background-color has-background wp-block-paragraph"><strong>Website traffic data, considered over a time sequence, is in fact an example of a time series.</strong></p>



<p class="wp-block-paragraph">Time series analysis is a set of methods that allow us to derive significant patterns or statistics from data with temporal information.</p>



<p class="has-light-gray-background-color has-background wp-block-paragraph">In very general terms, we can say that <strong>a time series is a sequence of random variables indexed in time</strong>.</p>



<p class="wp-block-paragraph">The purpose of analyzing a time series can be <strong>descriptive</strong> (consider decomposing the series to remove seasonality elements or to highlight underlying trends) or <strong>inferential</strong>, with the latter including forecasting values for future time periods that have not yet occurred.</p>



<span id="more-3331"></span>



<div style="border:1px solid #ccc;padding:1.2em 1.5em;margin:1.5em 0;border-radius:6px">
<h3 style="margin-top:0">What we&#8217;ll cover</h3>
<ul>
<li><a href="#theory-decomposition">A bit of theory: decomposing a time series</a></li>
<li><a href="#create-series-r">Creating a time series in R</a></li>
<li><a href="#plot-series">Plotting one or more time series</a></li>
<li><a href="#smoothing">Smoothing techniques</a></li>
<li><a href="#seo-example-ga4">An SEO example: from Google Analytics 4 to the time series</a></li>
<li><a href="#seasonal-plot">The seasonal plot: seasonality at a glance</a></li>
<li><a href="#holt-winters">Exponential smoothing with Holt-Winters and forecasting</a></li>
<li><a href="#arima">Investigating time series with ARIMA models</a></li>
<li><a href="#arima-example">A practical example of an ARIMA model</a></li>
<li><a href="#try-it-yourself">Try it yourself</a></li>
</ul>
</div>



<hr class="wp-block-separator has-css-opacity"/>



<h3 class="wp-block-heading" id="theory-decomposition">A bit of theory. Classical time series analysis. Decomposing a time series.</h3>



<p class="wp-block-paragraph">The classical method of time series analysis identifies four influences, or <strong>components</strong>:</p>



<ol class="wp-block-list">
<li><strong>Trend (T)</strong>: the general long-term movement of the values (Y) of the time series over a long period of time.</li>



<li><strong>Cyclical Fluctuations (C)</strong>: recurring long-duration movements.</li>



<li><strong>Seasonal Variations (S)</strong>: fluctuations due to the particular time of year, for example the summer season compared to the winter months.</li>



<li><strong>Erratic or Irregular Movements (I)</strong>: irregular deviations from the trend, which cannot be attributed to cyclical or seasonal influences.</li>
</ol>



<p class="wp-block-paragraph"><strong>According to the classical time series analysis model, the value of the variable in each period is determined by the influences of the four components.</strong></p>



<p class="wp-block-paragraph">The main purpose of classical time series analysis is precisely to <strong>decompose the series</strong>, to isolate the influences of the various components that determine the values of the time series.</p>



<h3 class="wp-block-heading">The four &#8220;classical&#8221; components and their relationship</h3>



<p class="wp-block-paragraph">The four components can be related to each other <strong>additively:</strong></p>



<p class="wp-block-paragraph">
<strong>Y = T + C + S + I</strong>
</p>



<p class="wp-block-paragraph">or <strong>multiplicatively:</strong></p>



<p class="wp-block-paragraph">
<strong>Y = T x C x S x I</strong>
</p>



<p class="wp-block-paragraph">Recall that a multiplicative model can be transformed into an additive model by exploiting the properties of logarithms:</p>



<p class="wp-block-paragraph"><strong>log(Y) = log(T) + log(C) + log(S) + log(I)</strong></p>



<hr class="wp-block-separator has-css-opacity"/>



<h3 class="wp-block-heading">A brief review: the useful properties of logarithms</h3>



<p class="has-background has-light-gray-background-color">
The logarithm of a number <i>n</i> to the base <i>c</i> (with c not equal to 1 and c > 0) is the exponent to which the base <i>c</i> must be raised to obtain <i>n</i>.<br><br>Therefore, if n = c<sup>b</sup> then log<sub>c</sub> n = b
</p>



<ul class="wp-block-list">
<li>When numbers are multiplied together, the logarithm of their product is the sum of their logarithms.</li>



<li>The logarithm of a fraction is the logarithm of the numerator minus the logarithm of the denominator.</li>



<li>The logarithm of a number with an exponent is the logarithm multiplied by the exponent of the number.</li>
</ul>



<hr class="wp-block-separator has-css-opacity"/>



<h2 class="wp-block-heading" id="create-series-r">Creating a time series in R from a vector or data frame</h2>



<p class="wp-block-paragraph">There are various ways to transform a data vector, a matrix, or a data frame into a time series.<br>Here, we will limit ourselves to the tools offered by base R to achieve this result.<br>The function of interest is simply called <strong>ts()</strong> and its use is rather intuitive.</p>



<p class="wp-block-paragraph">Let&#8217;s see a practical example. Suppose we have a data vector saved with the name mydata.csv in the /home folder of my PC.</p>



<p class="wp-block-paragraph">The first thing I will do is import my data into R, which I assume is in a CSV file:</p>



<pre class="wp-block-preformatted"><strong># import the data from a csv file into a dataframe</strong><br>dfmydata &lt;- read.csv("/home/mydata.csv")</pre>



<pre class="wp-block-preformatted"><strong># create a vector with the data I'm interested in</strong><br>mydata &lt;- dfmydata$myobservation</pre>



<p class="wp-block-paragraph">Now all I have to do is call the ts() function with the appropriate start and frequency values to create my time series.</p>



<p class="wp-block-paragraph">Let&#8217;s assume that the data is monthly, starting with January 2012 and going up to December 2018:</p>



<pre class="wp-block-preformatted">timeseries &lt;- ts(mydata, start=c(2012,1), end=c(2018,12), frequency=12)</pre>



<p class="wp-block-paragraph">As you can see, the typical form of the ts() function is</p>



<pre class="wp-block-preformatted">ts(vector, start, end, frequency)</pre>



<p class="wp-block-paragraph">where <strong>frequency is the number of observations per unit of time</strong>.</p>



<p class="wp-block-paragraph"><strong>Therefore, we will have 1=annual, 4=quarterly, 12=monthly…</strong></p>



<h3 class="wp-block-heading">Useful functions related to a time series</h3>



<p class="wp-block-paragraph">I can easily find out the time of the first observation of a time series using the command:</p>



<pre class="wp-block-preformatted">start()</pre>



<p class="wp-block-paragraph">Similarly, I can find the last observation with:</p>



<pre class="wp-block-preformatted">end()</pre>



<p class="wp-block-paragraph">The command:</p>



<pre class="wp-block-preformatted">frequency()</pre>



<p class="wp-block-paragraph">returns the number of observations per unit of time, while the very useful command</p>



<pre class="wp-block-preformatted">window()</pre>



<p class="wp-block-paragraph">allows you to extract a subset of data.</p>



<p class="wp-block-paragraph">Using as an example a dataset included in R called <em>Nile</em> containing 100 annual readings of the Nile river at Aswan for the years 1871 to 1970 (we will also use it in the following paragraph), the command looks like this:</p>



<pre class="wp-block-preformatted">nile_sub &lt;- window(Nile, start=1940,end=1960)</pre>



<h2 class="wp-block-heading" id="plot-series">Plotting one or more time series</h2>



<p class="wp-block-paragraph">One of the advantages of using time series is the simplicity in graphical representation. Using the <em><strong>Nile</strong></em> example dataset present in R, which is already a time series object &#8211; and I can verify this with the command <strong>is.ts(Nile)</strong> &#8211; I just need the command:</p>



<pre class="wp-block-preformatted">plot.ts(Nile)</pre>



<p class="wp-block-paragraph">to get a graph of the trend of my variable over time. Obviously, I can use the various attributes <em>xlab</em>, <em>ylab</em>, <em>main</em> etc&#8230; to make my graph even more meaningful and clear:</p>



<pre class="wp-block-preformatted">plot(Nile, xlab="Year", ylab="Annual Nile River Flow", main="Example Time Series: Nile")</pre>


<div class="wp-block-image is-resized">
<figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/07/nilo-esempio.png" alt="example time series: Nile" class="wp-image-1747" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/07/nilo-esempio.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/07/nilo-esempio-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">plot of the Nile time series</figcaption></figure>
</div>


<p class="wp-block-paragraph">If the dataset contains more than one time series object, I can get the graph of the various objects.</p>



<p class="wp-block-paragraph">Let&#8217;s use another example dataset present in R called <em>EuStockMarkets</em> which contains, as you can easily imagine, the prices of the FTSE, CAC, DAX and SMI stock markets:</p>



<pre class="wp-block-preformatted">plot(EuStockMarkets, plot.type = "multiple")</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-multiple.png" alt="Plot of multiple time series objects - example" class="wp-image-1748" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-multiple.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-multiple-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">Plot of multiple time series objects</figcaption></figure>



<p class="wp-block-paragraph">Alternatively, I can plot the various time series objects together—with different colors—in the same graph:</p>



<pre class="wp-block-preformatted">plot.ts(EuStockMarkets, plot.type = "single", col=c("red","black","blue","green"))</pre>


<div class="wp-block-image is-resized">
<figure class="aligncenter size-large"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-insieme.png" alt="Plot of multiple time series objects in the same graph - example" class="wp-image-1750" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-insieme.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/07/eustockmarkets-insieme-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">Plot of multiple time series objects in the same graph</figcaption></figure>
</div>


<hr class="wp-block-separator has-css-opacity"/>



<h2 class="wp-block-heading" id="smoothing">Smoothing Techniques</h2>



<p class="wp-block-paragraph">If we graphically represent a time series, we will almost always notice a series of small variations that can make it very difficult to identify important trends and make predictions for the future.  To address this problem, various &#8220;<strong>smoothing</strong>&#8221; techniques have been developed, which we can, for simplicity, divide into two main families: techniques based on <strong>moving averages</strong> and <strong>exponential techniques</strong>.</p>



<p class="has-yellow-background-color has-background wp-block-paragraph"><strong>MOVING AVERAGE</strong><br>Instead of the data for month X, I calculate the average of n months, where X is the central point. <br>The random component is compensated for if we put several months together; its average is equal to 0 for a reasonable number of periods.<br>The seasonal component repeats regularly throughout the year, so if I distribute the seasonal effect over all 12 months, the effect disappears. <br>With the moving average, I achieve both desired effects: I compensate for randomness and &#8220;distribute&#8221; seasonality.</p>



<p class="wp-block-paragraph">R, as we will soon see, provides fundamental help by making a whole series of tools available to us to carry out our analyses with maximum practicality.</p>



<h2 class="wp-block-heading" id="seo-example-ga4">An SEO Example: from Google Analytics 4 to the Time Series</h2>



<p class="wp-block-paragraph">As we have seen, we can easily create a time series in R using the basic command ts().</p>



<p class="wp-block-paragraph">Since I intend to use time series analysis for SEO purposes to derive a trend and make a forecast, I need to <strong>import the data of my Google Analytics 4 property into R</strong>. (N.b.: the first version of this article used the Universal Analytics API, which Google permanently shut down in July 2024; the code that follows is updated to the <strong>GA4 Data API</strong>.)<br><br>I can do this &#8220;automatically&#8221; using the very useful <strong><a aria-label="googleAnalyticsR (opens in a new tab)" rel="noreferrer noopener" href="https://code.markedmondson.me/googleAnalyticsR/" target="_blank">googleAnalyticsR</a></strong> library — which supports GA4 through the <strong>ga_data()</strong> function — or by manually exporting the data to a csv file.<br><br>Let&#8217;s see the first case:</p>



<pre class="wp-block-preformatted"># use the googleAnalyticsR library
# which obviously I must have
# correctly installed
library(googleAnalyticsR)
# Authorize Google Analytics
ga_auth()
# set the numeric ID of the GA4 property
# (Admin &gt; Property &gt; Property details)
property_id &lt;- 123456789
# Retrieve the data I need:
# three years of monthly sessions
# from the GA4 Data API
gadata &lt;- ga_data(property_id, 
          metrics = "sessions", 
          dimensions = "yearMonth",
          date_range = c("2023-01-01", "2025-12-31"),
          limit = -1)
# The Data API does not guarantee row order:
# sort by month before building the series
gadata &lt;- gadata[order(gadata$yearMonth), ]
# Convert the data into a time series
# with monthly frequency by indicating frequency=12:
ga_ts &lt;- ts(gadata$sessions, start = c(2023,1), frequency = 12)</pre>



<p class="has-white-background-color has-background wp-block-paragraph">The procedure to obtain the same result without going through the API is just as simple.<br><br>From the GA4 interface I open the report I am interested in (for example <em>Reports &gt; Life cycle &gt; Acquisition &gt; Traffic acquisition</em>), set the date range and use the share button to download the csv file. Alternatively, I can query the Data API directly from the browser with the <a aria-label="GA4 Query Explorer (opens in a new tab)" rel="noreferrer noopener" href="https://ga-dev-tools.google.com/ga4/query-explorer/" target="_blank"><strong>GA4 Query Explorer</strong></a>: I choose the property, the metric (<em>sessions</em>), the dimension (<em>yearMonth</em>), the date range, and download the result.<br><br>I then open the file with a text editor and delete any general header lines and the totals at the bottom. If I want, I can rename the columns (&#8220;date&#8221; and &#8220;sessions&#8221;) for better readability.<br><br>All that remains is to import the csv and create the time series. A matter of two lines:</p>



<pre class="wp-block-preformatted"># Import a very simple dataset
# with month and sessions
sitedata &lt;- read.csv("path/monthly-sessions.csv", header = TRUE)

# Convert the data into a time series
sitedata_ts &lt;- ts(sitedata$sessions, start = c(2023,1), frequency = 12)</pre>



<p class="has-white-background-color has-background wp-block-paragraph">Now that I have my time series, I have a multitude of R packages available that provide me with all the useful tools for any type of analysis, from the most basic to the most in-depth.</p>



<h3 class="wp-block-heading">Limiting the Effect of Seasonality Through Moving Averages</h3>



<p class="wp-block-paragraph">Install the <em>forecast</em> package to be able to use the very useful <strong>ma()</strong> function:</p>



<pre class="wp-block-preformatted">library(forecast)<br>sitedata.filt &lt;- ma(sitedata_ts, order=12)<br>sitedata.filt</pre>



<p class="wp-block-paragraph">In this way, a <strong>weighted average</strong> has been applied to our time series to <strong>limit the effect of seasonality</strong>. I can now visualize the estimated trend using the moving average system:</p>



<pre class="wp-block-preformatted">lines(sitedata.filt, col="red")</pre>



<h3 class="wp-block-heading">Removing the Seasonal Trend Using Differencing</h3>



<p class="wp-block-paragraph">Using differencing with the diff() function and an appropriate lag, I can eliminate the seasonal trend, if present. In the case of a monthly time series that includes several years of observations, I just need to use a lag=12, as in this trivial example where I assume I&#8217;m working on a time series x:</p>



<pre class="wp-block-preformatted"># remove the seasonal trend
# assuming monthly data and the presence of seasonality
dx &lt;- diff(x, lag=12)
ts.plot(dx, main="Deseasonalized User Trend")</pre>



<h3 class="wp-block-heading">Decomposing the Time Series Through Moving Averages</h3>



<p class="wp-block-paragraph">We have seen in the course of the article that the classical method of time series analysis identifies four influences, or components, and that the main purpose is precisely to decompose the series to isolate the influences of the various components that determine the values of the time series.</p>



<p class="wp-block-paragraph">Moving from theory to practice, let&#8217;s see how we can proceed.<br>I can use the <strong>decompose()</strong> function of the <em>stats</em> package to perform a classical decomposition of my time series into its components using the moving average system and representing everything in a single clear graph:</p>



<pre class="wp-block-preformatted"># decompose the time series. I have chosen a multiplicative decomposition<br># obviously, I could have chosen an additive one<br>components &lt;- decompose(sitedata_ts, type ="multiplicative")<br>names(components)<br># explore the components of the time series in the graph<br>plot(components)</pre>



<h3 class="wp-block-heading">Decomposing the Series with the LOESS Method</h3>



<p class="wp-block-paragraph">A more refined alternative for decomposing the series is the one that uses the <strong>LOESS</strong> (Locally Weighted Smoothing) method.  It is a set of non-parametric methods that fit polynomial regression models to subsets of the data. We use the <strong>stl()</strong> function of the <em>stats</em> package for this purpose:</p>



<pre class="wp-block-preformatted"># use stl for a LOESS type decomposition<br>sitedata_loess &lt;- stl(sitedata_ts, s.window="periodic")<br>head(sitedata_loess$time.series)<br>plot(sitedata_loess)</pre>



<p class="wp-block-paragraph">Compared to decompose(), <strong>stl()</strong> has two advantages that carry real weight in everyday practice: it allows the seasonal component to evolve over time (by passing a numeric value to s.window instead of &#8220;periodic&#8221;) and it is more robust in the presence of outliers. And from the decomposition I can instantly derive the <strong>seasonally adjusted series</strong> — the traffic &#8220;cleansed&#8221; of seasonality, the one that tells us whether we are really growing or it is just high season — thanks to the seasadj() function of the <em>forecast</em> package:</p>



<pre class="wp-block-preformatted">library(forecast)
# the seasonally adjusted series: trend + remainder
sitedata_adj &lt;- seasadj(sitedata_loess)
plot(sitedata_adj, main="Seasonally adjusted sessions")</pre>



<h3 class="wp-block-heading" id="seasonal-plot">The Seasonal Plot: Seasonality at a Glance</h3>



<p class="wp-block-paragraph">There is one chart that, in everyday practice, is worth half the seasonality analysis on its own: the <strong>seasonal plot</strong>. The idea is simple: instead of drawing the series as a single continuous line, I overlay the years — months on the horizontal axis, one line per year. If the lines resemble each other (the August dip, the November peak), seasonality is there, visible and recurring; if one line breaks away from the others, something happened that year and it deserves a closer look. I draw the seasonal plot with the ggseasonplot() function of the <em>forecast</em> package:</p>



<pre class="wp-block-preformatted">library(forecast)
# one line per year, months on the x axis
ggseasonplot(sitedata_ts, year.labels = TRUE)

# "polar" variant: months arranged in a circle
ggseasonplot(sitedata_ts, polar = TRUE)</pre>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img decoding="async" width="962" height="608" src="https://www.gironi.it/blog/wp-content/uploads/2026/06/seasonal-plot-sessioni-mensili.png" alt="Seasonal plot of three years of monthly sessions: one line per year, with the recurring summer dip" class="wp-image-3670" srcset="https://www.gironi.it/blog/wp-content/uploads/2026/06/seasonal-plot-sessioni-mensili.png 962w, https://www.gironi.it/blog/wp-content/uploads/2026/06/seasonal-plot-sessioni-mensili-300x190.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">The seasonal plot: three years overlaid, the seasonal profile repeats</figcaption></figure>
</div>


<p class="wp-block-paragraph">Three years compared at a glance: the year-over-year growth (the lines shift upwards) and the seasonal profile repeating itself, with the summer low and the autumn recovery. Alternatively, the base R command monthplot(sitedata_ts) groups the observations by month, showing the average and the evolution across years for each month.</p>



<h2 class="wp-block-heading" id="holt-winters">Exponential Smoothing with the Holt-Winters Method and Forecasting</h2>



<p class="wp-block-paragraph">Smoothing and forecasting techniques offer us powerful operating modes for predicting future values of time series data.</p>



<p class="wp-block-paragraph">At the most basic level, smoothing can be achieved using moving averages.</p>



<p class="wp-block-paragraph">In R, we can use <strong>HoltWinters</strong>, a function to perform time series smoothing.<br>The function contains three exponential smoothing methods. All three methods use the same function, HoltWinters. However, we can invoke them separately based on the values of the alpha, beta, and gamma parameters.</p>



<p class="wp-block-paragraph">Holt-Winters exponential smoothing provides reliable forecasts only if there is no autocorrelation in the time series data, which can be verified, as we will see shortly in practice, with the <strong>acf</strong> function and a <strong>Box-Pierce or Ljung-Box test</strong>.</p>



<p class="wp-block-paragraph">After creating a forecasting model, we must evaluate it to understand if it correctly represents the data. Similar to a regression model, we can use <strong>residuals</strong> for this purpose. If the residuals follow a <strong>white noise</strong> distribution, then the sequence (or error) of residuals is generated by a stochastic process. And therefore, our model represents the time series well.</p>



<p class="wp-block-paragraph">Let&#8217;s see an example. Suppose we have a time series x:</p>



<pre class="wp-block-preformatted"># We use the forecast function for a prediction: next 6 periods
x.hw &lt;- HoltWinters(x)
future.forecast &lt;- forecast(x.hw, h=6)
# print a summary to the screen
summary(future.forecast)
# draw the graph
plot(future.forecast)
# draw the graph of the residuals to estimate the autocorrelation
acf(future.forecast$residuals, na.action = na.pass)
# perform an autocorrelation test
Box.test(future.forecast$residuals)</pre>



<p class="wp-block-paragraph"><strong>Autocorrelation tells us whether the terms of a time series depend on its past.</strong><br><br>If we consider a time series x of length n, the lag 1 autocorrelation can be estimated as the correlation of the pair of observations (x[t], x[t-1]).</p>



<p class="wp-block-paragraph">R provides us with a convenient command: <strong>acf()</strong>.<br>Using:</p>



<pre class="wp-block-preformatted">acf(x, lag.max = 1, plot = FALSE)</pre>



<p class="wp-block-paragraph">on the x series, the autocorrelation of degree -1 is automatically calculated.</p>



<p class="wp-block-paragraph">By default, the command acf(x) draws a graph that shows two dashed blue horizontal lines, which represent the 95% confidence interval. <br>The autocorrelation estimate is indicated by the height of the vertical bars (<em>obviously, the autocorrelation at degree 0 is always 1</em>).</p>



<p class="wp-block-paragraph">The confidence interval is used to determine the statistical significance of the autocorrelation.</p>



<p class="wp-block-paragraph">I show, as an example, the output of the acf() function on the Nile time series provided by R:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/07/acf-nile.png" alt="Example of an ACF graph of the autocorrelation of a time series" class="wp-image-1772" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/07/acf-nile.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/07/acf-nile-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<p class="wp-block-paragraph">If the autocorrelation coefficient decreases and falls rapidly between the bounds, this means that the residuals follow a white noise distribution. There is no apparent autocorrelation.<br>Conversely, if the coefficients are always above or below the limit, this means that the residuals are autocorrelated.</p>



<p class="wp-block-paragraph">A <strong>Ljung-Box</strong> autocorrelation test is a particular form of hypothesis test, and it provides a <em>p</em>-value as output, a value that allows us to understand whether to reject the null hypothesis or not.</p>



<p class="wp-block-paragraph">Let&#8217;s apply the box.test function to the residual sequence; we find the p-value. If it is greater than the value of α, we cannot reject the null hypothesis. That is, the residuals are white noise, and this demonstrates that our model &#8220;works well&#8221; in predicting the value.</p>



<p class="wp-block-paragraph">Let&#8217;s see it all in action in our SEO example related to website traffic, also using the <em>highcharter</em> library for better visualization of the output:</p>



<pre class="wp-block-preformatted"># First, I load the libraries I need<br><strong>library(googleAnalyticsR)</strong> <br># to read Google Analytics 4 data<br><strong>library(forecast)</strong> <br># for time series forecasting<br><strong>library(highcharter) </strong><br># to get the chart<br># numeric ID of the GA4 property<br># (Admin &gt; Property &gt; Property details)<br><strong>property_id &lt;- 123456789</strong><br># Authorize Google Analytics<br><strong>ga_auth()</strong><br># and then retrieve the data from the GA4 Data API<br>sitedata<strong> &lt;- ga_data(property_id, <br>            metrics = "sessions", <br>            dimensions = "yearMonth",<br>            date_range = c("2023-01-01", "2025-12-31"),<br>            limit = -1)</strong><br># nb: the dimension of my data is yearMonth<br># and I sort the rows by month<br>sitedata &lt;- sitedata[order(sitedata$yearMonth), ]<br># Now I express the data as a time series<br>sitedata_ts<strong> &lt;- ts(</strong>sitedata$sessions<strong>, start = c(2023,1), frequency = 12)</strong><br> <br># Calculate Holt-Winters smoothing<br>hw_forecast<strong> &lt;- HoltWinters(</strong>sitedata_ts<strong>)</strong><br># Generate a forecast for the next 12 months<br><strong>hchart(forecast(</strong>hw_forecast<strong>, h = 12))</strong></pre>



<p class="wp-block-paragraph">The reader is tasked with testing the quality of the forecasting model.</p>



<h2 class="wp-block-heading" id="arima">Investigating Time Series with ARIMA Models</h2>



<p class="wp-block-paragraph">Using the exponential smoothing method requires that the residuals are not correlated. In real-world cases, this is quite unlikely. However, we have other tools available to address these cases: R provides us with the ARIMA function to build time series models that take autocorrelation into account.</p>



<h3 class="wp-block-heading">White Noise</h3>



<p class="wp-block-paragraph">The very useful <code>arima.sim</code> function allows you to simulate an ARIMA process by generating ad hoc time series data.</p>



<p class="wp-block-paragraph">Through this function, therefore, we can begin to look at two basic time series models: <strong>white noise</strong> and the <strong>random walk</strong>.</p>



<p class="wp-block-paragraph">An ARIMA model consists of three components: ARIMA(p,d,q).</p>



<ul class="wp-block-list">
<li>p is the order of autoregression</li>



<li>d is the order of integration</li>



<li>q is the order of the moving average</li>
</ul>



<p class="wp-block-paragraph">White noise is the most basic example of a stationary process.  Its salient characteristics are:</p>



<ol class="wp-block-list">
<li>It has a fixed, constant mean.</li>



<li>It has constant variance.</li>



<li>It does not follow any temporal correlation.</li>
</ol>



<p class="wp-block-paragraph">The white noise model in ARIMA terms is therefore ARIMA(0,0,0).</p>



<p class="wp-block-paragraph">Let&#8217;s simulate a time series of this type:</p>



<pre class="wp-block-preformatted">wn &lt;- arima.sim(model = list(order = c(0,0,0)), n=100)
ts.plot(wn)</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/rumore-bianco-ts.png" alt="Example of a White Noise Time Series" class="wp-image-1781" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/rumore-bianco-ts.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/rumore-bianco-ts-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">An example of a time series of a White Noise process</figcaption></figure>



<h3 class="wp-block-heading">Random Walk</h3>



<p class="wp-block-paragraph">The Random Walk is a simple example of a <strong>non-stationary process</strong>.  It has the following salient characteristics:</p>



<ul class="wp-block-list">
<li>It does not have a specific mean or variance.</li>



<li>It shows strong temporal dependence.</li>



<li>Its changes or increments are of the White Noise type.</li>
</ul>



<p class="wp-block-paragraph">The random walk model is also a basic time series model and can be easily simulated with our <code>arima.sim</code> function.<br>The Random Walk model is the cumulative sum of White Noise series with a mean of zero.<br>From this, it follows that the first differenced series of a Random Walk series is a White Noise series!</p>



<p class="wp-block-paragraph">The ARIMA model for a Random Walk series is ARIMA(0,1,0).</p>



<p class="wp-block-paragraph">Let&#8217;s generate a series of this type and visualize it:</p>



<pre class="wp-block-preformatted">RW &lt;- arima.sim(model = list(order = c(0,1,0)), n = 100)
ts.plot(RW)</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/RW-ts.png" alt="Example of a Random Walk model" class="wp-image-1783" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/RW-ts.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/RW-ts-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /><figcaption class="wp-element-caption">Graph of a Random Walk time series</figcaption></figure>



<p class="wp-block-paragraph">Let&#8217;s see the proof of what was stated above:</p>



<pre class="wp-block-preformatted">RWdiff &lt;- diff(RW)
ts.plot(RWdiff)</pre>



<p class="wp-block-paragraph">We obtain precisely a White Noise series:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/RWdiff.png" alt="Differenced Random Walk Series" class="wp-image-1784" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/RWdiff.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/RWdiff-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<h2 class="wp-block-heading">The ARIMA Model in Action</h2>



<p class="wp-block-paragraph">The Autoregressive Integrated Moving Average (ARIMA) model is also known as the <strong>Box-Jenkins model</strong>, named after statisticians George Box and Gwilym Jenkins.</p>



<p class="wp-block-paragraph">The purpose of ARIMA is to find the model that best represents the values of a time series.</p>



<p class="wp-block-paragraph">An ARIMA model can be expressed as ARIMA(p, d, q), where, as we have already seen, p is the order of the autoregressive model, d indicates the degree of differencing, and q indicates the order of the moving average.</p>



<p class="wp-block-paragraph"><strong>Operationally, we can define five steps to fit time series to an ARIMA model:</strong></p>



<ol class="wp-block-list">
<li>Visualize the time series with a graph.</li>



<li>Difference non-stationary time series to obtain stationary time series.</li>



<li>Plot ACF and PACF graphs to find the optimal values of p and q, or derive them using the <code>auto.arima</code> function.</li>



<li>Build the ARIMA model.</li>



<li>Make the forecast.</li>
</ol>



<h2 class="wp-block-heading" id="arima-example">Let&#8217;s See a Practical Example of an ARIMA Model</h2>



<p class="wp-block-paragraph">1. Simulate an ARIMA process using the <code>arima.sim()</code> function and plot the graph:</p>



<pre class="wp-block-preformatted">simts &lt;- arima.sim(list(order = c(1,1,0), ar = 0.64), n = 100)
plot(simts)</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/arimasim.png" alt="ARIMA model simulation" class="wp-image-1788" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/arimasim.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/arimasim-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<p class="wp-block-paragraph">2. Difference the series to obtain a stationary time series and plot the graph:</p>



<pre class="wp-block-preformatted">simts.diff &lt;- diff(simts)
plot(simts.diff)</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/simts-diff.png" alt="Stationary Series" class="wp-image-1789" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/simts-diff.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/simts-diff-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<p class="wp-block-paragraph">3. Use the <code>auto.arima</code> function to estimate the best values for p, d, and q:</p>



<pre class="wp-block-preformatted">auto.arima(simts, ic="bic")


Series: simts 
ARIMA(1,1,0) 
Coefficients: 
ar1 0.6331 
s.e. 0.0760 
sigma^2 estimated as 0.9433: log likelihood=-138.73 
AIC=281.46 AICc=281.58 BIC=286.67</pre>



<p class="wp-block-paragraph">4. Create the ARIMA model with the indicated p, d, and q values (in our example, 1, 1, 0):</p>



<pre class="wp-block-preformatted">fit &lt;- Arima(simts, order=c(1,1,0))
summary(fit)</pre>



<p class="wp-block-paragraph">5. Based on our ARIMA model, we can now proceed to forecast future values of the series and plot the graph:</p>



<pre class="wp-block-preformatted">fit.forecast &lt;- forecast(fit)
summary(fit.forecast)
plot(fit.forecast)</pre>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/arimaprev.png" alt="ARIMA forecast" class="wp-image-1790" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/arimaprev.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/arimaprev-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<p class="wp-block-paragraph">The shaded areas show the 80% and 95% confidence intervals.</p>



<p class="wp-block-paragraph">Finally, let&#8217;s evaluate the goodness of our model with an ACF graph:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="855" height="540" src="https://www.gironi.it/blog/wp-content/uploads/2020/08/prev-res.png" alt="Autocorrelogram" class="wp-image-1791" style="width:641px;height:405px" srcset="https://www.gironi.it/blog/wp-content/uploads/2020/08/prev-res.png 855w, https://www.gironi.it/blog/wp-content/uploads/2020/08/prev-res-300x189.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></figure>



<p class="wp-block-paragraph">As with the exponential smoothing model, we can use the <code>acf</code> function to calculate the residuals and create the autocorrelation plot. Since the autocorrelation coefficient decreases rapidly, the residuals are white noise.</p>



<p class="wp-block-paragraph">We can also perform a Box-Pierce test:</p>



<pre class="wp-block-preformatted">Box.test(fit.forecast$residuals)

Box-Pierce test 
data: fit.forecast$residuals
X-squared = 0.020633, df = 1, p-value = 0.8858</pre>



<p class="wp-block-paragraph">And we obtain a p-value that indicates the non-rejection of the null hypothesis.</p>



<h2 class="wp-block-heading" id="try-it-yourself">Try It Yourself</h2>



<p class="wp-block-paragraph">To consolidate everything, here is a complete exercise that requires neither a website nor access to Google Analytics: we generate the data ourselves. Let&#8217;s build in R three years of realistic monthly sessions — an underlying growth, a marked summer dip (as happens to many B2B sites) and some random noise:</p>



<pre class="wp-block-preformatted">set.seed(42)

# 36 months: growing trend + seasonality + noise
trend &lt;- seq(8000, 14000, length.out = 36)
seasonality &lt;- rep(c(1.05, 1.08, 1.12, 1.04, 0.98, 0.88,
                     0.72, 0.65, 1.02, 1.12, 1.16, 1.10), 3)
noise &lt;- rnorm(36, mean = 1, sd = 0.04)
sessions &lt;- round(trend * seasonality * noise)

# the time series: monthly, from January 2023
traffic_ts &lt;- ts(sessions, start = c(2023, 1), frequency = 12)</pre>



<p class="wp-block-paragraph">The exercise retraces the whole article, in five steps:</p>



<ol class="wp-block-list">
<li>Plot the series with plot.ts(traffic_ts): can you already see trend and seasonality with the naked eye?</li>



<li>The seasonal plot with ggseasonplot(traffic_ts, year.labels = TRUE): it is exactly the chart shown earlier in this article — is the August dip recognizable at first glance?</li>



<li>Decompose with stl(traffic_ts, s.window = &#8220;periodic&#8221;), plot the components with plot() and derive the seasonally adjusted series with seasadj().</li>



<li>Estimate the model and the forecast: future &lt;- forecast(HoltWinters(traffic_ts), h = 6), then plot(future).</li>



<li>Evaluate the residuals: acf(residuals(future), na.action = na.pass) and Box.test(residuals(future), type = &#8220;Ljung-Box&#8221;). Does the model hold up?</li>
</ol>



<p class="wp-block-paragraph">If everything goes smoothly, the forecast for the first six months of 2026 should continue the growth while respecting the seasonal profile (n.b.: with seed 42, the point forecast for January is around 15,300 sessions), and the Ljung-Box test on the residuals should return a p-value comfortably above 0.05: the residuals are white noise, the model has captured the structure of the series.</p>



<p class="wp-block-paragraph">One final warning before we close, because the tool is powerful but treacherous: a forecasting model — be it Holt-Winters or ARIMA — <strong>projects the regularities of the past into the future</strong>. It works as long as the world keeps behaving the way it has behaved: a Google core update, a botched migration, an aggressive competitor are not in the model, and no confidence band can predict them. This is why, in SEO practice, forecasting is less about guessing the future and more about noticing quickly that the present is drifting away from it: when real traffic exits the forecast bands, that is where we need to go and look. We discussed this &#8220;negative&#8221; use of models in the article on <a href="https://www.gironi.it/blog/en/anomaly-detection-how-to-identify-outliers-in-your-data/">anomaly detection</a>, and we will soon return to the decomposition of organic traffic with a dedicated case study.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h3 class="wp-block-heading">Further Reading</h3>



<p class="wp-block-paragraph">To dig deeper into time series analysis &mdash; stationarity, autocorrelation, forecasting models &mdash; <a href="https://www.amazon.it/dp/8891906190?tag=consulenzeinf-21&#038;ascsubtag=time-series-analysis-and-forecasting-in-r" rel="nofollow sponsored noopener" target="_blank"><em>Introduzione all&#8217;econometria</em></a> by Stock and Watson (Italian edition) devotes clear and rigorous chapters to the subject.</p>



<p class="wp-block-paragraph">On the popular side, <a href="https://www.amazon.it/dp/0141975652?tag=consulenzeinf-21&#038;ascsubtag=time-series-analysis-and-forecasting-in-r" rel="nofollow sponsored noopener" target="_blank"><em>The Signal and the Noise</em></a> by Nate Silver tells the story of why forecasts fail so often — and how good forecasters think: the ideal companion read, no computer required.</p>


<!-- internal-links-section -->
<h3>You might also like</h3>
<ul>
<li><a href="https://www.gironi.it/blog/en/anomaly-detection-how-to-identify-outliers-in-your-data/">Anomaly Detection: How to Identify Outliers in Your Data</a></li>
<li><a href="https://www.gironi.it/blog/en/simpsons-paradox-in-seo-when-aggregate-data-can-lie/">Simpson&#8217;s Paradox in SEO: When Aggregate Data Can Lie</a></li>
</ul>]]></content:encoded>
					
					<wfw:commentRss>https://www.gironi.it/blog/en/time-series-analysis-and-forecasting-in-r/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
