Examples of Time Series Analysis and Forecasting
Chapter 13 Time Series Analysis and Forecasting
z z
z
Ming Lee, Assistant Professor Department of Civil and Environmental Engineering University of Alaska Fairbanks
Cross-sectional vs. Time Series Data z
Cross-sectional data are gathered at approximately the same time point from a cross section of the population. z
z
z
z
Components of Time Series Data z
z
President’s approval rating in March 2007
Time series data track a set of variables across different time points. Intel (INTL) stock prices and NASDAQ index from 1995 to 2005
Regression analysis can be applied to both cross-sectional and time series data.
Forecast customer arrivals at a Carl’s Jr. Forecast movements of stock prices or real estate market prices Forecast movements in macroeconomic variables such as inflation, interest rates, and unemployment
z
z
Trend component: Patterns of increasing or decreasing over the long run Seasonal component: Patterns of regular fluctuation over different seasons that often repeats itself every year Cyclic component: Patterns of irregular fluctuation with irregular time periods Random component: Patterns of “vibration”, movements that can go up and down with no predictable regularity
1
Trend Component
Seasonal Component
Linear Trend
Seasonal Component
25
8 20
7 6
15 10
Y
5
Linear (Y)
4
Y
3 5
2 1
0 0
2
4
6
8
10
0
12
1
Seasonal Component with Trend
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Cyclic Component Cyclic Component
Seasonal Component With Trend 12
25
10
20
8
15 Y 10
Y
6 4 2
5
0
0
1
1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
2
Random Component with Trend
Random Component
Random Component with Trend
Random Component 30
7
25
6
20
5 4
Y
3
Y
15 10
2
5
1
0
0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time Series of Estimated Market Values of a Real Estate Property
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1 Year View of the Time Series
3
Time Series since Last Sale
Forecast Error z
Et-k, t = Yt – Ft-k, t z
z z
Error Measures z z z z z
Apply to all models that estimate and forecast. Mean Absolute Error (MAE) Root Mean Square Error (RMSE) Mean Absolute Percentage Error (MAPE) N is the number of forecasts made
⎛ N ⎞ ⎜ ∑ Et ⎟ t =1 ⎝ ⎠ MAE = N ⎛ N 2⎞ ⎜ ∑ Et ⎟ RMSE = ⎝ t =1 ⎠ N ⎛ N Et ⎜∑ ⎜ t =1 Y t MAPE = 100% × ⎝ N
Et-k, t : Forecast error (for forecast made k time points before t) Yt : Actual value at time t Ft-k, t : Forecasted value (forecast made k time points before t)
Regression-Based Trend Models z
z
Many time series follow a long-term trend except for the variation due to the random component. Linear trend model: z z
Assumes that the time series variable changes by a constant amount each time point. A regression model in which time t is the single explanatory variable.
Yt = a + bt + ε t ⎞ ⎟ ⎟ ⎠
z
(13.6)
εt has a mean 0 and a constant standard deviation σ
4
Exponential Trend z
z
Sometimes the time series variable changes by a constant percentage at each time point. Exponential trend model is suitable for such case: bt
Yt = ce ut
z
Example 13.3 Quarterly PC Device Sales
(13.7)
Regression-Based Trend Models
Logarithm conversion of 13.7:
ln(Yt ) = a + bt + ε t
(13.8)
Objective To estimate the company’s exponential growth and to see whether it has been maintained during the entire period from 1990 until the end of 2004.
PCDevice.xls z
z
z
This file contains quarterly sales data for a large PC device manufacturer from the first quarter of 1990 through the first quarter of 2004. Each sales value is expressed in millions of dollars. Are the company’s sales growing exponentially through this entire period?
5
Solution z
z
z
We will first estimate and interpret an exponential trend for the years 1990-2000. Then we will see how well the projection of this trend into the future fits the data after 2000. The time series plot through 2000 appears on the next slide. We can use Excel’s Chart/Add Trendline menu item, with the Exponential option, to superimpose an exponential tend line on this plot.
Solution -- continued z z
z
Time Series Plot of Sales with Exponential Trend Superimposed
Time Series Plot of Log Sales with Linear Trend Superimposed
The fit is evidently quite good. Equivalently, the next slide illustrates the time series of log sales for this same period, with the linear trend line superimposed. It fits well too.
6
Regression Output for Estimating Exponential Trend
What does it all mean? z
z
z
Forecasts z
z
z
z
The estimated equation is Forecasted Sales = 61.376e0.0663t The most important constant in this equation is the regression coefficient of Time, b=0.0663. Expressed as a percentage, this coefficient implies that the company’s sales were increasing by approximately 6.63% per quarter throughout this 11 year period. To use this equation for forecasting into the future, we substitute later values of Time into the regression equation, so that each future forecast is about 6.63% larger than the previous forecast.
Creating Forecasts of Sales
Has this exponential growth continued beyond 2000? As you might have guessed, it has not, due to slumping sales in the computer industry and increase competition from other manufacturers. We checked this by creating the Forecast column in the table on the next slide. This implements are estimate of the equation.
7
Forecasts -- continued z
z
Time Series Plot of Forecasts Superimposed on Sales
The time series graph of the two series Sales and Forecast, is shown on the next slide. It is clear that sales in the forecast period remained rather constant – nowhere near the 6.63% growth they exhibited in the estimation period.
Measures of Forecast Errors
Random Walk Model z
In a random walk model, time series variable itself is not random, but the changes from one point to the next is random. z
z
Such behavior is typical of stock price data.
Random Walk Model: Yt = Yt −1 + μ + ε t
(13.9)
Yt − Yt −1 = μ + ε t
(13.10)
N
Ft −1 = Yt +
z z
∑ (Y − Y t
1
t −1
) (13.11)
N μ is a constant. μ > 0 if the series increases and μ < 0 if decreases εt has a mean 0 and a constant standard deviation σ
8
Objective Example 13.4 Random Walk Model of Stock Prices
To check whether the company’s monthly closing prices follow a random walk model with an upward trend, and to see how future prices can be forecasted.
The Random Walk Model
The Problem
Time Series Plot
The monthly closing prices of the tractor company’s stock from January 1999 through April 2005, shown on the next slide, indicate some upward trend. Do they follow a random walk model with an upward drift? If so, how should future values of these stock prices be forecasted?
9
Solution
Solution -- continued
We have already seen that the closing price series itself is not random, due to the upward trend. To check for the adequacy of a random walk model, we need the differenced series. Each value in the differenced series is that month’s closing prices minus the previous month’s closing price. This series can be calculated easily with an Excel formula.
This differenced series appears in column C of the table on the next slide. This table also shows the mean and the standard deviation of the differences, 0.418 and 4.245, which will be used in forecasting.
Differences of Closing Prices
Time Series Plot of Differences
10
Solution -- continued A visual inspection of it also supports the conclusion of random differences, although they do not vary around a mean of 0. Rather, they vary around a mean of 0.418, indicated by the horizontal line. This positive value measures the upward drift – the closing prices increase, on average, by 0.418 per month. Finally, note that the variability in this figure is fairly constant. Specifically, the zigzags do not tend to get appreciably wider through time. Therefore, we can conclude that the random walk model with an upward drift fits these data quite well.
Forecasting To forecast future closing prices, we add the number of months ahead being forecasted times the mean difference to the final closing price.
Autoregression Models z
z z
Regress the current value of the time series variable on past (lagged) values A sophisticated way of extrapolation Trials and errors are often required to see how many lags of past values are needed
Example 13.5 Forecasting Hammer Sales Autoregression Models
11
Objective To use autoregression, with the appropriate number of lagged terms, to forecast hammer sales.
Time Series Plot of Sales
Hammers.xls A retailer has recorded its weekly sales of hammers (units purchased) for the past 42 weeks. The data are found in the file. The graph of this time series appears on the next slide and reveals a “meandering” behavior.
The Plot and Data The values begin high and stay high awhile, then get lower and stay lower awhile, then get higher again. This behavior could be caused by any number of things. How useful is autoregression for modeling these data and how would it be used for forecasting?
12
Autoregression Output with Three Lagged Variables
Autoregression -- continued We see that R2 is fairly high, about 57%, and that se is about 15.7. However, the p-values for lags 2 and 3 are both quite large. It appears that once the first lag is included in the regression equation, the other two are not really needed. Therefore, we re-ran the regression with only the first lag included.
Autoregression Output with a Single Lagged Variable
Forecasts from Aggression This graph shows the original Sales variable and its forecasts.
13
Regression Equation -continued
Regression Equation The estimated regression equation is Forecasted Salest = 13.763 + 0.793Salest-1 The associated R2 and se values are approximately 65% and 155.4. The R2 is a measure of the reasonably good fit we see in the previous graph, whereas the se is a measure of the likely forecast error for short-term forecasts. It implies that a short-term forecast could easily be off by as much as two standard errors, or about 31 hammers.
To use the regression equation for forecasting future sales values, we substitute known or forecasted sales values in the right hand side of the equation. Specifically, the forecast for week 43, the first week after the data period, is approximately 98.6 using the equation ForecastedSales43 = 13.763 + 0.793Sales42
The forecast for week 44 is approximately 92.0 and requires the forecasted value of sales in week 43 in the equation: ForecastedSales44 = 13.763 + 0.793ForecastedSales43
Moving Averages z z
The simplest and most often used extrapolation methods is the method of moving averages. A moving average is the average of the time series variable in the past few time points, where the number of terms used to calculate the average is the span. z
z
Estimating Seasonality with Dummy Variables z
Yˆt = a + bt + b1Q1 + b2Q2 + b3Q3 z
Q1, Q2, and Q3, are dummy variables with values 0 and 1, indicating which quarter it is. z z
For example, we can use the average price of a stock from January to April (with a span of 4 months) to forecast its price in May.
The determination of the span requires a careful examination of the data
When we have a time series of quarterly data, we can use regression with dummy variables to make forecasts.
z z
z
First quarter: Q1 = 1, Q2 = 0, Q3 = 0 Second quarter: Q1 = 0, Q2 = 1, Q3 = 0 Third quarter: Q1 = 0, Q2 = 0, Q3 = 1 Fourth quarter: Q1 = 0, Q2 = 0, Q3 = 0
You CANNOT have four dummy variables in the regression, because it will create perfect multicollinearity (errors occur in the results of the equations used to calculate regression coefficients and statistics).
14
Objective Example 13.7b Quarterly Soft Drink Sales
To use a multiplicative regression equation, with summary variables for seasons and a time variable for trend, to model soft drink sales.
Estimating Seasonality with Regression
Softdrink.xls We return to the data file which contains the sales history of soft drinks from 1986 to quarter 2 of 1996. Does a regression approach provide forecasts that are as accurate as those provided by the other seasonal methods in this chapter?
Solution We illustrate a multiplicative approach, although an additive approach is also possible. The data setup is as follows:
15
Solution
Regression Output
Besides the Sales and Time variables, we need dummy variables for three of the four quarters and a Log(Sales) variable. We then can use multiple regression, with the Log(Sales) as the dependent variable and Time, Q1, Q2, and Q3 as the explanatory variables. The regression output appears as follows. Again, to make a fair comparison with previous methods, we base the regression only on the data though quarter 1 of 1999. That is we hold out the last 8 quarters.
Interpreting the Output
Forecast Errors and Summary Measures
Of particular interest are the coefficients of the explanatory variables. Recall that for a log dependent variable, these coefficients can be interpreted as percentage changes in the original sales variable. Specifically, the coefficient of Time means that deseasonalized sales increase by 2.1% per quarter. This pattern is quite comparable to the pattern of seasonal indexes we saw in the last two examples.
16
Forecast Accuracy Note that these summary measures are considerably larger for this regression model than for the previous seasonality models, especially in the holdout period. We can get some idea why the holdout period does so poorly by looking at the plot of observations versus forecasts on the next slide. The multiplicative regression model with Time included really implies exponential growth, with seasonality superimposed. However, this company’s growth tapered off in the last couple years and did not keep up with the exponential growth curve.
Plot of Forecasts for Multiplicative Model
Conclusions In short, the dummy variables do a good job of tracking seasonality, but the underlying exponential trend curve outpaces actual sales.
17