Time Series Analysis and Forecasting
Forecasting methods can be classified as qualitative or quantitative. Qualitative methods are appropriate when historical data on the variable being forecast are either unavailable or not applicable. Quantitative methods are used when past information is available and it is reasonable to assume that past is prologue.
There is a package in R, called tswge
, which is available in CRAN.
The tswge
package contains about 40 functions and about 100 data
files that number may continue to grow.
A time series is a sequence of observations on a variable measured at successive points in time or over successive periods of time. We need a formal definition.
It is a custom to use the subscript notation Xi for a discrete parameter process, while a function notation X(t) for a continuous parameter process.
Forecasting is required in many situations: deciding whether to build another building at the university campus in the next five years requires forecasts of future enrollment of students; scheduling staff in a call center next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for case of capital investments), or only a few minutes beforehand (for telecommunication routing).
Some things are easier to forecast than others. The time of sunset tomorrow evening can be forecast very precisely. On the other hand, tomorrow's loto numbers cannot be forecast with any accuracy. Forecasting is a common statistical task in business, where it helps to inform decisions about the scheduling of production, transportation, and personnel; it provides a guide to long term strategic planning. Most quantitative forecasting problems use either time series data (collecting at regular intervals over time) or cross-sectional data (collecting at a single point in time).
Time series can be observed either at regular intervals of time (e.g., hourly, daily, weekly, monthly, quarterly, annually) or irregular intervals. However, the former is mostly used in practice. Many time series include trend, cycles, and seasonality. When choosing a forecasting method, it is needed to identify first the time series pattern in the data, and then choose a method that is able to capture the patterns properly.
Horizontal Pattern
A horizontal pattern is a time series pattern that exists when the data fluctuates around a constant mean. We call a time series whose statistical properties are independent of time stationary time series. This means:
- The process generating the data has a constant mean.
- The variability of the time series is constant over time.
Example: to demonstrate, we will create a set of test data using the following commands:
Trend pattern
A trend pattern is exactly like it sounds; it is a time series in which the series values move in a direction (up or down) as t increases. A trend is usually the result of long-term factors such as population increases or decreases, changing demographic characteristics of the population, technological advances, and/or consumer preferences. Although time series data generally exhibit random fluctuations, a time series may also show gradual shifts to relatively higher or lower values over a long period of time.
Example: For example, we can look create a sample dataset for Bicycle Sales. This time, we will simply input a vector, and then use the forecast package from CRAN in order to create a time series object within R.
plot()
command.
Seasonal Pattern
The trend of a time series can be identified by analyzing movements in historical data over multiple time periods. A plot that exhibits a repeating pattern over a one-year period due to seasonal influences is called a seasonal pattern. However, time series data can also exhibit seasonal behavior in periods of less than one year in duration. Both are considered examples of seasonal patterns.
Example: we will look at data bike sales over a 5-year period. This makes sense to be effected seasonally, as bike sales are likely more prominent in times of warm weather.
ts()
, which will turn our vector into a Time Series object with our specified seasonality, and seasonplot() to compare the results stacked on top of each other.
forecast
. Upon its installation, you can use a special command:
seasonplot
, as the following examples show.
Along with time plots, there are other useful ways of plotting data to emphasize seasonal patterns and show changes in these patterns over time.
A seasonal plot is similar to a time plot except that the data are plotted against the individual “seasons” in which the data were observed. You can create one using the ggseasonplot()
function the same way you do with autoplot()
.
An interesting variant of a season plot uses polar coordinates, where the time axis is circular rather than horizontal; to make one, simply add a polar
argument and set it to TRUE
.
A subseries plot comprises mini time plots for each season. Here, the mean for each season is shown as a blue horizontal line.One way of splitting a time series is by using the window()
function, which extracts a subset from the object x observed between the times start
and end
.
window(x, start = NULL, end = NULL)
Example: Consider a10
data that contains monthly sales volumes for anti-diabetic drugs in Australia. In the plots, you can see which month has the highest sales volume each year. It is helpful to load fpp2
package and use its autoplot()
and ggseasonplot()
to produce plots of the a10 data.
Here there is a clear and increasing trend. There is also a strong seasonal pattern that increases in size as the level of the series increases. The sudden drop at the end of each year is caused by a government subsidization scheme that makes it cost-effective for patients to stockpile drugs at the end of the calendar year. Any forecasts of this series would need to capture the seasonal pattern, and the fact that the trend is changing slowly.
An alternative plot that emphasizes the seasonal patterns is where the data for each season are collected together in separate mini time plots.
The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons. In this example, the plot is not particularly revealing; but in some cases, this is the most useful way of viewing seasonal changes over time.
Example: Consider the following data that shows the weekly economy passenger load on Ansett Airlines between Australia's two largest cities.
The time plot immediately reveals some interesting features. There was a period in 1989 when no passengers were carried --- this was due to an industrial dispute. There was a period of reduced load in 1992. This was due to a trial in which some economy class seats were replaced by business class seats. A large increase in passenger load occurred in the second half of 1991. There are some large dips in load around the start of each year. These are due to holiday effects. There is a long-term fluctuation in the level of the series which increases during 1987, decreases in 1989 and increases again through 1990 and 1991.
Example: For illustration purposes, I’m using the male and female monthly deaths from lung diseases in the UK.
For plotting functions that do not use an S3 plot() method, there is now a ggplot2
version with “gg” prefixed to the function name.
There is also a new geom_forecast()
function which uses forecast.ts()
to obtain forecasts of the time series passed to autoplot()
.
We emphasize what we learn from above examples.
- A trend exists when there is a long-term increase or decrease in the data. There is a trend in the antidiabetic drug sales data shown above.
- A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. The monthly sales of antidiabetic drugs above shows seasonality partly induced by the change in cost of the drugs at the end of the calendar year.
- A cycle occurs when the data exhibit rises and falls that are not of a fixed period. These fluctuations are usually due to economic conditions and are often related to the "business cycle". The economy class passenger data above showed some indications of cyclic effects.
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
mpg
Cyclical Pattern
A cyclical pattern exists if the time series plot shows an alternating sequence of points below and above the trendline that lasts for more than one year. Many economic time series exhibit cyclical behavior with regular runs of observations below and above the trendline. Often the cyclical component of a time series is due to multiyear business cycles. The following example clarifies the patterns.
Example: Consider the total quarterly beer production in Australia (in megalitres) from 1956 to 2008. In order to identify the underlying model, we will break our model into 3 terms for identifying the time series pattern: Trend, Seasonality, and Random Noise. We will calculate them separately, and then bring them into the model as a group in order to build our model piece by piece.
- Step 1: Import the data
- Step 2: Detect the Tend
To detect the underlying trend, we smooth the time series using the “centered moving average“. To perform the decomposition, it is vital to use a moving window of the exact size of the seasonality. Therefore, to decompose a time series we need to know it seasonality period: weekly, monthly, etc. If you do not know the seasonality period, you can detect the seasonality using Fourier Transform.
The Australian beer production clearly follows an annual seasonality. As it is quarterly recorded, there is 4 data points recorded per year we use moving average windows of 4.
- Step 3: De-trend the time series
Removing the previously calculated trend from the time series will result into a new time series which clearly exposes the seasonality.
- Step 4: Average Seasonality
From the detrend time series, it’s easy to compute the average seasonality. We add the seasonality together and divide by the number of seasonality. Technically speaking, to average together the time series we feed the time series into a matrix. Then, we transform the matrix so each column contains the elements of the same period (same day, same month or same quarter…). Finally, we compute the mean of each column.
Quarterly seasonality: we use a matrix of 4 rows. The average seasonality is repeat 16 times to compare the graphic later (see below).
- Step 5: Random Noise
Previous steps already extract most data from the original time series. It only left with some “random” noise. The additive formula define “Time series = Seasonal + Trend + Random” which means “Random = Time series – Seasonal – Trend”
- Step 6: Reconstruct the Original Signal
The decomposed time series can logically be re-composed using the model formula. This should reproduce the original signal. Some data are missing at the beginning and the end of the recomposed time series. This is due to the moving average windows who need to ingest some data point before producing average data points.
Finally, we have the ability to both examine the true dynamics of the relationship in our time series data, as well as the ability to start using our model for inference moving forward! However, R will allow us to use the function decompose()
in order to calculate these aspects of our data in just a few seconds! Using our same dataset, we will quickly decompose the Time Series data into 4 aspects, the Observed, the Trend, the Seasonal, and the Random.
Note: in the decompose()
command line, we have an entry within the command that reads “additive.” We must tell the function whether to treat the model as an additive or multiplicative model. When looking at our 6 step approach to manually calculate the Trend, Seasonality, and Randomness, all aspects were created off of a formula for additivity. If we were looking at a multiplicative model, our formula becomes:
The forecast error is simply \( e_t = y_t - \hat{y}_t , \) where yt and ŷt are the actual and forecasting values of the time series for period t. The forecast error is on the same scale as the data. Accuracy measures that are based on et are therefore scale-dependent and cannot be used to make comparisons between series that are on different scales.
Therefore, we can take the average of these values, and find our Mean Forecast Error:
\[ \mbox{MFE} = \mbox{mean}(e_t ) = \frac{1}{n-k}\, \sum_{t=k+1}^n e_t . \]The two most common scale-dependent measures are based on the absolute errors or squared errors.
\[ \begin{split} \mbox{Mean Absolute Error: } &\quad \mbox{MAE} = \mbox{mean}(|e_t |) = \frac{1}{n-k} \, \sum_{t=k+1}^n \left\vert e_t \right\vert , \\ \mbox{Root Mean Squared Error: } &\quad \mbox{MSE} = \sqrt{\mbox{mean}( e_t^2 )} = \left( \frac{1}{n-k} \, \sum_{t=k+1}^n e_t^2 \right)^{1/2} . \end{split} \]When comparing forecast methods on a single data set, the MAE is most popular as it is easy to understand and compute.
The size of the MAE and MSE depends on the scale of the data. As a result, it is difficult to make comparisons for different time intervals (such as comparing a method of forecasting monthly gasoline sales to a method of forecasting weekly sales) or to make comparisons across different time series (such as monthly sales of gasoline and monthly sales of oil filters). To make comparisons such as these we need to work with relative or percentage error measures. Therefore, we can also look at percentage errors, which have the advantage of being scale-independent. Therefore, they are frequently used to compare forecast performance between different data sets.
The percentage error is given by\( p_t = 100\,e_t / y_t \) and it has the advantage of being scale-independent, and so are frequently used to compare forecast performance between different data sets. The most commonly used measure is:
\[ \mbox{Mean Absolute Percentage Error}: \quad \mbox{MAPE} = \frac{1}{n-k} \, \sum_{t=k+1}^n \left\vert \left( \frac{e_t}{y_t} \right) 100 \right\vert . \]Measures based on percentage errors have the disadvantage of being infinite or undefined if yt = 0 for any t in the period of interest, and having extreme values when any yt is close to zero. Another problem with percentage errors that is often overlooked is that they assume a meaningful zero. For example, a percentage error makes no sense when measuring the accuracy of temperature forecasts on the Fahrenheit or Celsius scales.
They also have the disadvantage that they put a heavier penalty on negative errors than on positive errors. This observation led to the use of the so-called "symmetric" MAPE (sMAPE) proposed by Armstrong (1985), which was used in the M3 forecasting competition. It is defined by
\[ \mbox{sMAPE} = \mbox{mean} \left( 200 \left\vert y_t - \hat{y}_t \right\vert / \left( y_t + \hat{y}_t \right) \right) . \]However, if yt is close to zero, ŷt is also likely to be close to zero. Thus, the measure still involves division by a number close to zero, making the calculation unstable. Also, the value of sMAPE can be negative, so it is not really a measure of "absolute percentage errors" at all. Hyndman and Koehler (2006) recommend that the sMAPE not be used.
Scaled errors were proposed by Hyndman and Koehler (2006) as an alternative to using percentage errors when comparing forecast accuracy across series on different scales. They proposed scaling the errors based on the training MAE from a simple forecast method. For a non-seasonal time series, a useful way to define a scaled error uses naïve forecasts:
\[ q_t = \dfrac{e_t \left( T -1 \right)}{\sum_{k=2}^T \left\vert y_k - y_{k-1} \right\vert} . \]Because the numerator and denominator both involve values on the scale of the original data, qt is independent of the scale of the data. A scaled error is less than one if it arises from a better forecast than the average naïve forecast computed on the training data. Conversely, it is greater than one if the forecast is worse than the average naïve forecast computed on the training data. For seasonal time series, a scaled error can be defined using seasonal naïve forecasts:
\[ q_t = \dfrac{e_t \left( T -m \right)}{\sum_{k=m+1}^T \left\vert y_k - y_{k-1} \right\vert} . \]For cross-sectional data, a scaled error can be defined as
\[ q_t = \dfrac{e_t \,N}{\sum_{i=1}^N \left\vert y_i - \overline{y} \right\vert} . \]In this case, the comparison is with the mean forecast.
The mean absolute scaled error is simply
\[ \mbox{MASE} = \mbox{mean} \left( | q_t | \right) . \] Similarly, the mean squared scaled error (MSSE) can be defined where the errors (on the training data and test data) are squared instead of using absolute values.Example: Consider forecast for quarterly beer production.
Figure shows three forecast methods applied to the quarterly Australian beer production using data only to the end of 2005. The actual values for the period 2006-2008 are also shown. We compute the forecast accuracy measures for this period.
It is obvious from the graph that the seasonal naïve method is best for these data, although it can still be improved, as we will discover later. Sometimes, different accuracy measures will lead to different results as to which forecast method is best. However, in this case, all the results point to the seasonal naïve method as the best of these three methods for this data set.
Example: To take a non-seasonal example, consider the Dow Jones Index. The following graph shows the 250 observations ending on 15 July 1994, along with forecasts of the next 42 days obtained from three different methods.
Here, the best method is the drift method (regardless of which accuracy measure is used).
5. Determining the Best Forecasting Model
This section presents the forecasting method that is appropriate for a time series with horizontal pattern: moving averages. This method is capable of adapting well to changes in the level of a horizontal pattern such as we discussed previously. The moving averages method uses the average of the most recent k data values in the time series as the forecast for the next period.
\[ \begin{split} \hat{y}_{t+1} &= \frac{1}{k} \,\sum (\mbox{most recent k data values}) = \frac{1}{k} \,\sum_{i=t-k+1}^n y_i \\ &= \frac{1}{k} \left( y_{t-k+1} + \cdots + y_{t-1} + y_t \right) , \end{split} \] where- ŷt+1 = forecast of the time series for period t+1
- yt = actual value of the time series in period t
- k = number of periods of time series data used to generate the forecast.
The term moving is used because every time a new observation becomes available for the time series, it replaces the oldest observation in the equation and a new average is computed. Hence, the periods over which the average is calculated change, or move, with each ensuing period.
Simple Moving Average is a method of time series smoothing and is a very basic forecasting technique. It does not require estimation of model parameters, but rather is based on order selection.
Example: we will look at the “Elecsales” database in R.
ma()
function.
We can see that the moving average takes into account the period of 5 years, and takes an average of the data at each 5-year point. This results in a “smoothed” curve for our data.
Now that we have our data, we will add our indicator variables.
Example:
Another method of forecasting is known as “Exponential Forecasting.” This refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time series. It is defined as:
\[ \hat{y}_{t+1} = \alpha\,y_t + \left( 1- \alpha \right) \hat{y}_t = \hat{y}_t + \alpha \left( y_t - \hat{y}_t \right) , \] where- ŷt+1 = forecast of the time series for period t+1
- yt = actual value of the time series in period t
- ŷt = forecast of the time series for period t
- α = smoothing constant
We use the Holt--Winters Method as the most common method of exponential smoothing for data that experiences both seasonal and trend influences. For an example, we will look at our “beer” dataset from earlier.
To the extend that seasonality exists, we need to incorporate it into our forecasting models to ensure accurate forecasts.
Example:
Seasonality without Trend
To look at Seasonality without Trend, we could perform the same steps as we previously did. However, because there should be no trend that is shown, it is equally suitable to transform the x-variable such that we only refer to the spot within the period, not the overall time value (i.e. if there is a period of 4, and it is the 6th month, we will show this value as a repeated 2 for the x-value).
ARIMA (Autoregressive integrated moving average) models create an autoregressive moving average, predicts our regression parameters, as well as allows for correlation between predictors. ARIMA models also can be used to find confidence intervals for future forecasts, that are based off of our multiple regression equation.
Example: we will look at the number of chicken eggs laid by one hen per year.
The arima()
function in the stats package provides seasonal and non-seasonal ARIMA model estimation including covariates.
However, it does not allow a constant unless the model is stationary; it does not return everything required for code>forecast(). It does not allow re-fitting a model to new data.
Use the Arima()
function in the forecast package which acts as a wrapper to arima()
. Or use the auto.arima()
function in the forecast package. There are examples of its applications:
auto.arima(x, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2, max.order = 5, start.p = 2, start.q = 2, start.P = 1, start.Q = 1, stationary = FALSE, ic = c("aic", "aicc", "bic"), stepwise = TRUE, trace = FALSE, approximation = (length(x)>100 | frequency(x)>12), xreg = NULL, test = c("kpss", "adf", "pp"), seasonal.test = c("ocsb", "ch"), allowdrift = TRUE, lambda = NULL)
auto.arima(x, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2, max.order = 5, start.p = 2, start.q = 2, start.P = 1, start.Q = 1, stationary = FALSE, ic = c("aic", "aicc", "bic"), stepwise = TRUE, trace = FALSE, approximation = (length(x)>100 | frequency(x)>12), xreg = NULL, test = c("kpss", "adf", "pp"), seasonal.test = c("ocsb", "ch"), allowdrift = TRUE, lambda = NULL)Automatic ARIMA algorithm due to Hyndman and Khandakar (2008). For high frequency data:
ets()
has maximum period 24Arima()
has maximum period 350, but usually runs out of memory if period > 200.stl()
allows a decomposition of any frequency.dshw()
will allow two seasonal periods.
The methods discussed for estimating linear trends and seasonal effects make use of patterns in historical values of the variable to be forecast. These methods are classified as time series methods because they rely on past values of the variable to be forecast when developing the model. However, the relationship of the variable to be forecast with other variables may also be used to develop a forecasting model. generally, such models include only variables that are believed to cause changes in the variable to be forecast.