16.2 Exponential smoothing models

Moving-average forecast models appeal to our intuition. Using the average of several of the most recent data values to forecast the next value of the time series is easy to understand conceptually. However, two criticisms can be made against moving-average models.

First, our forecast for the next time period ignores all but the last \(k\) observations in our data set. If you have 100 observations and use a span of \(k=5\), your forecast will not use 95% of your data!
Second, the data values used in our forecast are all weighted equally. In many settings, the current value of a time series depends more on the most recent value and less on past values. We may improve our forecasts if we give the most recent values greater weight in our forecast calculation.

Exponential smoothing models address both of these criticisms.

Exponential Smoothing Model

The moving-average forecast model uses the average of the last \(k\) values of the time series as the forecast for time period \(t\).

The number of preceding values included in the moving average is called the span of the moving average.

The simple exponential smoothing method consists, as in the case of moving averages, in a transformation of the original variable. If a variable \(y_t\) is subjected to a simple exponential smoothing process, the result is the smoothed variable \(a_t\). Theoretically, the smoothed variable \(a_t\) would be obtained according to the expression:

\[\begin{equation} a_t = (1 – w) y_t + (1 – w) wy_{t-1}+ (1-w) w^2 y_{t-2} + (1 – w) w^3 y_{t-3} + \ldots \label{eq:alisadosimple01} \end{equation}\]

where \(w\) is a parameter taking values between 0 and 1, and the dots indicate that the number of terms of the smoothed variable can be infinite. The above expression is really just a weighted arithmetic mean of infinitely many values of \(y\).

It is called smoothed because it smoothes the oscillations of the series, since it is obtained as a weighted average of different values. On the other hand, the qualification of exponential is due to the fact that the weighting or weight of the observations decreases exponentially as we move away from the current time \(t\). This means that the observations that are far away have very little incidence in the value that \(a_t\) takes. Finally, the adjective simple is applied to distinguish it from other cases in which, as we will see later, a variable is subjected to a double smoothing operation.

Once these conceptual aspects have been seen, we will proceed to the operative obtaining of the smoothed variable, since the expression is not directly applicable because it contains infinite terms. Delaying one period in the previous expression we have that:

\[ a_{t-1} = (1 – w) y_{t-1} + (1 – w) wy_{t-2} + (1-w) w^2 y_{t-3} + \ldots \]

Multiplying both members by \(w\) gives:

\[\begin{equation} wa_{t-1} = (1 – w) wy_{t-1} + (1 – w) w^2 y_{t-2} + (1 – w) w^3 y_{t-3} + \ldots \label{eq:alisadosimple02} \end{equation}\]

Subtracting from member by member and ordering the terms we have that:

\[ a_{t}=(1-w) y_{t}+w a_{t-1} \] or:

\[ a_{t}=\alpha y_{t}+(1-\alpha) a_{t-1} \] where \(\alpha=1-w\).

Now we only need to compute the values of \(\alpha\) and \(a_{0}\), parameters from which it is easy to find the values of the smoothed variable recursively, such that:

\[ \begin{array}{l} a_{1}=\alpha y_{1}+(1-\alpha) a_{0} \\ a_{2}=\alpha y_{2}+(1-\alpha) a_{1} \\ a_{3}=\alpha y_{3}+(1-\alpha) a_{2} \end{array} \]

When assigning a value to \(\alpha\) it should be kept in mind that a small value of \(\alpha\) means that we are giving a lot of weight to past observations through the term \(a_{t-1}\). Conversely, when \(\alpha\) is large more weight is given to the current observation of the variable \(Y\). In general, it seems that a value of \(\alpha\) equal to 0.2 is appropriate in most cases. Alternatively, one can select that value of \(\alpha\) for which one obtains a smaller Root Mean Squared Error in the sample period prediction.

Regarding the assignment of value to \(a_{0}\), these assumptions are usually made:

when the series has many oscillations, \(a_{0}=y_{1}\) is taken;
on the contrary, when the series has a certain stability, \(a_{0}=y_bar{y}\) is made.