Time Series Analysis of 1979-1981 Data

Time Series Analysis of 1979/12/01 and 1981/12/12 Series: Data Exploration, Analysis, Modelling, and

Table of Contents Data Exp lo ratio n ................................ ................................ ................................ ........................ 2 Summary S tatistics ................................ ................................ ................................ ................. 2 Ana lysis T ests ................................ ................................ ................................ ........................ 2 Test for S tatio narity ................................ ................................ ................................ ........... 2 Patterns ................................ ................................ ................................ ............................... 3 Mode lling ................................ ................................ ................................ ................................ ... 6 Data Prepara tio n ................................ ................................ ................................ ..................... 6 Mode l Eva lua tio n ................................ ................................ ................................ ................... 8 Results a nd Disc ussio n ................................ ................................ ................................ .............. 9 Disc ussio n o f Expe cted Acc urac y ................................ ................................ .......................... 9 Data Exploration This section provides an overview of the 1981/12/12 and 1979/12/01 series using summa r y statistics, tests for stationary, and patterns in each series includ ing seasonality, trend, ACF & PACF analysis . Summary Stati sti cs Table 1 below provides an overview of the summary statistics of the data. It is note d tha t the mean of the 1979 series is relative ly higher (M = 6168 ) compared to the 1981 series (M = 5389 ). Table 1: Summary statistics X1981 X1979 Min. :4750 Min .: 3054 1st Qu.:5228 1st Qu.: 5059 Median :5360 Median: 6153 Mean :5389 Mean: 6168 3rd Qu.:5530 3rd Qu.: 7143 Max .:6075 Max .:10159 Anal ysi s Tests Te st for Stationarity An Augmented Dickey -Fuller test was used to test for stationarity in the 1981 series at a significa nce level of 0.05. It was noted that at 0.05 level, with Dickey -Fuller = -2.0901 and p-value = 0.5391 (see appendix 1 ), we fail to reject the null hypothesis of non - stationarity and conclude that the series is non -stationa ry. To correct for stationarity, the data was transformed using a first -order differe nc ing (see appendix 2 ). From appendix 3 it is however noted that the 1979 series was stationary with p = 0.01 thus the data series was not treated for stationarity. Patte rns Seasonality From figure 1 below, it is observed that the time series tends to have a repeating pattern indicating possible seasonal components . However, using the nsdiffs() functio n it was note d that the data is not seasonal. Therefore, no seasonality treatment was applied . Examining figure 1, the time series does not have apparent seasonality even though there is a series of highs and lows. The last of seasonality are which was confirmed when we tested for seasonal differe nc e s. Figure 1: 1981 series Trend Figure 2 below show s the distributio n of the 1981 and 1979 timeseries data using a trend line . In 1981, the series is noted to have a decreasing trend over time similar to the 1979 series whic h shows a slightly increasing trend i.e., the series tends to center around the mean. On the othe r hand, the trend in the 1981 series confirms that the series is non -stationary. Figure 2: 1979 and 1981 trend ACF and PACF Plot s The ACF (autocorrelatio n functio n ) and PACF (partial autocorrelatio n functio n) were used to determine the AR and MA components of an ARIMA mode l. Since each of the two series was established to be non -seasonal, for the AR component, a geometric decay is expected for the ACF plot which are significa nt till p lags in the PACF while for the MA components, a geometric decay is expected in the PACF plot which are significa nt till lags p in the ACF plot. Figure 3 below provide s an overview of the ACF and ACF plots for the 1981 series . Figure 3: ACF and PACF plot s of 1981 time series Based on figure 3, we note that the series has a geometric decay indicating that it is non - stationary while the re are no significa nt lags in the PACF plot since the PACF cuts immed ia t e ly at lag 1 suggesting an AR (1) would be appropriate for the time series. From figure 4, the ACF plot is significa nt at lags 2 indicating the series has MA (2) components while from the PACF plot, the PACF is significa nt at lag 2 indicating AR (2) model would be appropriate therefo re , an ARIMA (2, 0, 2 ) would also be used during experime nta tio n for the series . Figure 4: ACF and PACF plots of 1979 series Modelling In this study, we sought to imple me nt three time series models includ ing Ex ponential Smoothing (ES), autoregressive integrated moving average (ARIMA ), and a Time Serie s Regression model . Data Preparati on ARIMA As noted earlier, both series had a differe nc ing of 1 when tested using the ndiff () functio n . Therefore, during data preparation, the series were differenced using a first order differe nc e after which they were examined using ACF and PACF plots as shown in figures 5 and 6 belo w for the differe nced 1979 and 1981 series respectively. Figure 5: Differenced 1979 series From figure 5 we still note that the time series while having no trend, still has AR ( 2) and MA (2) components. Therefore, when imple me nting the ARIMA models, one of the models will be ARIMA ( 2, 1, 2 ). However, from figure 6 we note that the differe nced 1981 series has AR (4) and MA (3) components . Therefore, for the 1981 series, one of the models would be ARIMA (4, 1, 3 ). Figure 6: Differenced 1981 series Exponential Smoothing Model Data used for exponentia l smoothing was obtained after first order differe nc ing process described above . For the exponentia l model, we imple me nted one automatic model whic h selects the best value of smoothing parameter ( alpha ) and 9 other automatic models with varying alpha from 0.1 to 0.9 with a 1 step. The best value of alpha was selected based on the avera ge error of the models on the training data. Regression Model Since for the regression model we needed a predictor attribute, each of the differenced time series was lagged with k = 1 after which the lagged series was used as the predictor and the actual differenced series was used as the response attribute indicating the previous value was used to predict the subsequent observation . During imple me nta tio n of the regression models, various models were imple me nted and their fit examined by varying the number of lags. The lags that were tested were 1 and 10 for each of the two series. Model Eval uati on The performance of the models was evaluated using the mean error (ME) of the models on the train and test data. During model selection, the model with the least ME was proposed fo r deployment. Results and Discussion Di scussi on of Expected Accuracy Table 2 below provides an overview of the mean error of each of the models on the train data i.e., exponentia l smooth ing, ARIMA, and time series regression model on each of the time series train data . Table 2: Model performance on the train data Model 1979 1981 Exponentia l Smoothing Automatic -5.438138 -0.04 Best Manual -19.45896 0.12 ARIMA Automatic -3.135916 0.214522 Best Manual -4.31259 -3.594793 Regression 1-Lag 3.648656e -15 1.53213e -15 10 -Lag 2.371244e -14 1.448935e -16 Based on the mean error scores given in table 2 above, we note that the time series regressio n model with 1 lag has the lowest absolute mean error (1.53213e -15 Ã¢â€°Ë†0) indicating that the mod e l makes better predictions of the 1981 series. However , for the 1979 series the regression mod e l with 1 0 lag s had the best performance with the lowest mean error (2.371244e -14 Ã¢â€°Ë†0). The performance of each of the models in predicting 14 future values from the test set are given in table 3 below. Table 3: Model performance on test data. Model 1979 1981 Exponentia l Smoothing Automatic 48.11646 -5.438138 Best Manual 47.89359 -19.45896 ARIMA Automatic 21.03209 -3.135916 Best Manual -2.834716 -4.31259 Regression Best model 325.1298 -7.181823 From table 3 above, we note that the ARIMA model with an error of -2.834 has the lowe st absolute ME hence the best model for forecasting the 1979 series while the automatic ARIMA outperformed had the best performance in forecasting the 1981 series. Overall we note tha t some of the models had on average lower forecasts compared to the actual values.

Get instant help from 5000+ experts for