Multivariate Discount Weighted Regression and Local Level Models Kostas Triantafyllopoulosa a
University of Newcastle, Newcastle upon Tyne, UK.
Abstract In this paper we propose a multivariate discount weighted regression technique to give a tractable solution to the problem of variance estimation and forecasting for the multivariate local level model. We give the correspondence between discount regression and matrix normal dynamic linear models and we show that the local level model can be treated with discount regression techniques. We illustrate the proposed methodology with London metal exchange data consisting of aluminium spot and future contract closing prices. The proposed estimate of the noise covariance matrix suggests these data exhibit high cross-correlation, which is discussed in some detail. The performance of the weighted regression model is evaluated with a simple outlier analysis. A sensitivity analysis shows that a low discount factor should be used and practical guidelines are given for general future use.
Keywords: Time series, dynamic models, Kalman filter, state space models, regression, Bayesian forecasting.
1
1
Introduction
Multivariate time series have, recently, received significant development from both theoretical and practical standpoints. Whittle (1963), Hannan (1970) and L¨ utkepohl (1993) examine the widely used stationary VARMA models for observation vectors, whilst L¨ utkepohl (1993, ch. 13), West and Harrison (1997, ch. 16) and Durbin and Koopman (2001, ch. 3) give detail consideration of state space or dynamic linear models. These models provide a sound statistical framework with inclusion of practical modelling mechanisms such as feed-forward and feed-back intervention. However, in most multivariate models the noise covariance matrices have to be assumed known, if tractability is desired. Local level models constitute the simplest class of dynamic models with significant application. Enns et al. (1982), Machak et al. (1983), Harvey (1986, 1989, ch. 8) and Durbin and Koopman (2001, p. 44) discuss multivariate local level models, although the problem of the specification of the noise covariance matrices does not receive much attention. Sequential Markov chain Monte Carlo techniques (Doucet et al., 2001) as well as EM algorithm techniques (Durbin and Koopman, 2001) are available in multivariate time series, but their efficiency needs to be studied when the dimension of the vector of the observations is large and when online forecasting is on demand. The only tractable multivariate dynamic model, which assumes a prior inverted Wishart distribution for the observation noise covariance matrix, is the so called matrix dynamic linear model, reported in West and Harrison (1997, p. 597) and applied in Salvador et al. (2003) and Salvador and Gargallo (2004). Triantafyllopoulos and Pikoulas (2002) proposed a new multivariate dynamic linear model, which is termed here as multivariate discount weighted regression. The novelty of this model is that, although it does not assume a prior Wishart distribution for the unknown noise covariance matrix, it produces neat recurrence equations for estimation and forecasting. In this paper, using discount weighted regression, we provide an efficient methodology for estimation and forecasting for the multivariate local level model. The observation
2
covariance matrix of the noise vector is assumed unknown and it is estimated from the data using a standard maximum likelihood procedure. The correspondence of the discount regression and the matrix dynamic linear model is established and some properties of the regression model are discussed. As a practical illustration of the proposed methodology we consider data consisting of aluminium spot and future contract prices. It is our belief that these data are interesting for two reasons: they have strong similarities with international exchange rates data and they are not often discussed in the literature. This paper is organized as follows. Section 2.1 gives the general description of discount regression as developed in Triantafyllopoulos and Pikoulas (2002). In section 2.2 the correspondence of discount regression and matrix normal dynamic models is established, while sections 2.3 and 2.4 show that multivariate local level models with unknown noise covariance matrices can be analyzed with discount regression techniques. The proposed local level model is illustrated by analyzing London metal exchange data in section 3. Conclusions follow in section 4 and the proofs are given in the appendix (section 5).
2 2.1
Multivariate discount weighted regression The model
Let {yt } be an p-dimensional vector of observations, which become available at roughly equal intervals of time. The multivariate discount weighted regression model (DWR) is defined by yt0 = Ft0 Θt + νt0 ,
νt |Σ ∼ Np×1 (0, Σ),
(observation equation)
(2.1a)
Θt = Θt−1 + Ωt ,
Ωt |Σ ∼ Nd×p (0, Wt , Σ),
(transition equation)
(2.1b)
where Ft is a d-dimensional design vector, Θt is a d × p state matrix, Σ is the p × p covariance matrix of the innovations νt , and Wt is a d × d covariance matrix. Conditional on Σ, the innovations νt and Ωt follow, respectively, multivariate and matrix-variate normal 3
distributions, the latter of which is typically defined us Ωt |Σ ∼ Nd×p (0, Wt , Σ) ⇔ vec(Ωt )|Σ ∼ Ndp×1 (0, Σ ⊗ Wt ), where vec(·) denotes the column stacking operator of a lower portion of a square matrix, ⊗ denotes the Kronecker product of two matrices and Np×1 (·) denotes the p-dimensional normal distribution. For more details on matrix-variate distributions the reader is referred to Gupta and Nagar (1999). For a positive integer T > 0, let y t = (y1 , . . . , yt ) denote the information set comprising the available data up to time t (t = 1, . . . , T ) and assume that the innovation series (νt )t=1,...,T and (Ωt )t=1,...,T are internally and mutually uncorrelated and also they are uncorrelated with the assumed prior Θ0 |Σ ∼ Nd×p (m0 , C0 , Σ),
(2.2)
for some known m0 and C0 . The design vector Ft is assumed known for all t and the covariance matrix Wt is specified with a discount factor δ as Wt = (1 − δ)Ct−1 /δ, where Ct−1 is a known covariance matrix at time t − 1 (see below). If we define Xt =
t−1 X
0 δ j Ft−j Ft−j
and Ht =
j=0
t−1 X
0 δ j Ft−j yt−j ,
j=0
then Triantafyllopoulos and Pikoulas (2002) show that the unbiased linear squares estimate, which is the same as the maximum likelihood estimate, of Θt based on the data y t is mt = Xt−1 Ht = Ct Ht = mt−1 + At e0t ,
(2.3)
where At = Ct Ft is the adaptive vector and et = yt − m0t−1 Ft is the one-step forecast error. The estimate mt is optimal in the sense that it minimizes the discounted sum of squares Sδ (Θ) =
t−1 X
¡ 0 ¢¡ 0 ¢0 0 0 δ j yt−j − Ft−j Θ yt−j − Ft−j Θ ,
j=1
for all d × p state matrices Θ.
4
With rt = yt − m0t Ft , the maximum likelihood estimate of the covariance matrix Σ is nt St = nt−1 St−1 + rt e0t
and nt = nt−1 + 1,
(2.4)
which, for uninformative prior degrees of freedom n0 ≈ 0 and a bounded S0 , reduces to t
St =
1X 0 rj ej . t j=1
The covariance matrix Ct is updated by: 1 Ct = δ
µ
Ct−1 Ft Ft0 Id − δ + Ft0 Ct−1 Ft
¶ Ct−1 ,
(2.5)
where Id is the d × d identity matrix. Conditional on Σ = St and y t , one can easily derive the posterior distribution of Θt and the k-step ahead predictive distribution of yt+k , i.e. Θt |Σt = St , y t ∼ Nd×p (mt , Ct , St ) and yt+1 |Σ = St , y t ∼ Np×1 {m0t Ft , (Ft0 Ct−1 Ft /δ+1)St }. Full details and derivation of these results appear in Triantafyllopoulos and Pikoulas (2002).
2.2
Relationship with matrix normal dynamic models
Model (2.1a) and (2.1b) corresponds to the matrix normal dynamic linear models (DLMs) as described in West and Harrison, (1997, p. 597). The matrix normal DLMs developed independently in Harvey (1986) and Quintana and West (1987) and they have been further explored in Salvador et al. (2003) and Salvador and Gargallo (2004). The matrix DLM is defined by yt0 = Ft0 Θt + νt0 ,
νt |Σ ∼ Np×1 (0, Σ),
(observation equation)
(2.6a)
Θt = Gt Θt−1 + Ωt ,
Ωt |Σ ∼ Nd×p (0, Wt , Σ),
(transition equation)
(2.6b)
where Gt is a d × d transition matrix and the remaining model components are as defined in the model (2.1a) and (2.1b). It is further assumed that in addition to the prior (2.2), 5
a prior inverted Wishart distribution is assumed for the covariance matrix Σ as Σ ∼ IWp (n0 + 2p, n0 S0 ) with density (n0 +p−1)/2
p(Σ) = c(n0 )n0 and
det(S0 )(n0 +p−1)/2 det(Σ)−(p+n0 /2) exp{−n0 trace(S0 Σ−1 )/2} (2.7) (
c(n0 ) =
µ ¶)−1 p Y n + p − j 0 Γ , 2p(n0 +p−1)/2 π p(p−1)/4 2 j=1
where n0 are the prior degrees of freedom and S0 is the prior variance estimate. Then inference and forecasting for model (2.6a) and (2.6b) apply providing a similar to Kalman filter algorithm. This algorithm is a fully Bayesian algorithm and it is summarized as follows. For any time t > 0, conditional on Σ, the posterior distribution of Θt is Θt |Σ, y t ∼ Nd×p (mt , Ct , Σ) and the one-step predictive distribution is yt+1 |Σ, y t ∼ Np×1 {mt Ft0 , (Ft0 Rt Ft + 1)Σ}, where mt = Gt mt−1 + At e0t
and Ct = Rt − (Ft0 Rt Ft + 1)At A0t ,
(2.8)
for At = Rt Ft /(Ft0 Rt Ft + 1), Rt = Gt Ct−1 G0t + Wt and m0 , C0 are taken from the prior (2.2). The distribution of Σ given y t is the inverted Wishart Σ|y t ∼ IWp (nt + 2p, nt St ), where nt St = nt−1 St−1 +
et e0t Ft0 Rt Ft + 1
and nt = nt−1 + 1.
(2.9)
Unconditionally of Σ, the posterior distribution of Θt and the predictive distribution of yt+1 are, respectively, multivariate and matrix-variate Student t distributions. Looking now at models (2.1a), (2.1b) and (2.6a), (2.6b), we see that if we set Gt = Id and Wt = (1 − δ)Ct−1 /δ, the two models coincide. We can see that the estimates mt , Ct and St produced by each model are essentially the same. First note that the estimates mt in equations (2.3) and (2.8) are the same. From equation (2.8) with Gt = Id we have Ct−1 Ct−1 Ft Ft0 Ct−1 1 Ct = − 2 0 = 2 δ δ (Ft Ct−1 Ft /δ + 1) δ
6
µ Id −
Ct−1 Ft Ft0 δ + Ft0 Ct−1 Ft
¶ Ct−1 ,
which is equation (2.5). From the definition of rt = yt − m0t Ft and from the recursion of mt , we have rt = yt − m0t−1 Ft − et A0t Ft = (1 − A0t Ft )et . Also it is 1 − A0t Ft = 1 −
Ft0 Ct−1 Ft 1 = 0 . 0 δ(Ft Ct−1 Ft /δ + 1) Ft Ct−1 Ft /δ + 1
Then from equation (2.9) nt St = nt−1 St−1 +
et e0t = nt−1 St−1 + (1 − At0 Ft )et e0t = nt−1 St−1 + rt e0t , Ft0 Ct−1 Ft /δ + 1
which is equation (2.4). The above show that the DWR is equivalent to the matrix normal DLM with identity transition matrix. However, we need to note that the DWR does not make any assumption for the distribution of Σ, while the DLM assumes an inverted Wishart prior.
2.3
The invertibility assumption of Xt
The DWR development is based on the assumption of the non-singularity of the matrix Xt . However, in some practical situations Xt is singular, e.g. when Ft = F is constant over time. It is evident that we can make use of the generalized Moore-Penrose inverse matrix to overcome this difficulty. In particular, let Xt+ denote the Moore-Penrose inverse matrix of Xt , which always exists and it is unique (Harville, 2000, ch. 20). The next result states that under a condition, we can still have the same recurrences of mt and Ct as in section 2.1. Theorem 1. Consider the DWR, defined by equations (2.1a) and (2.1b). With Xt as defined in section 2.1, let Xt+ denote the Moore-Penrose inverse of Xt and write mt = Xt+ Ht , At = Xt+ Ft and Ct = Xt+ , where Xt and Ht are as defined in section 2.1. If the following condition holds + rank([Xt−1 : Xt+ ]) = rank(Xt−1 ) = rank(Xt ),
7
(2.10)
+ + where [Xt−1 : Xt+ ] denotes the extended matrix of Xt−1 and Xt+ , then the recursive forms
of mt and Ct are given as in equations (2.3) and (2.5). The proof of the recurrence of St does not make use of the matrix Xt and so the updating of equation (2.4) holds, regardless of the singularity of Xt .
2.4
Local level models
The multivariate local level model is defined by yt = ψt + νt ,
νt |Σ ∼ Np×1 (0, Σ),
(observation equation)
(2.11a)
ψt = ψt−1 + ωt ,
ωt |Vt ∼ Np×1 (0, Vt ),
(transition equation)
(2.11b)
where yt is a p-dimensional vector of observations, ψt is a p-vector of states, Σ and Vt are the p × p noise covariance matrices. Typically, the covariance matrices will be unknown and we seek to obtain a fast algorithm which will allow estimation of these covariance matrices as well as it will allow efficient forecasting of yt , for t = 1, . . . , T , for a positive integer T > 0. When p = 1 and Vt = V is constant over time, the analysis of the local level model follows by considering the signal noise ratio q = V /Σ (West and Harrison, 1997, ch. 2; Franco and Souza, 2002). Enns et al. (1982) and Machak et al. (1983) suggest to use the signal noise ratio for p > 1 and these authors provide some details on the scalar function q(·) with V = qΣ. Harvey (1986) provides an estimator for Σ, based on maximum likelihood estimation of q, but that method cannot be used for online estimation since each time a new estimator for Σ is required a new estimate of q is needed and clearly this cannot be applied for every consecutive t > 0. In addition to that the log likelihood function used to estimate q, and hence Σ, is a complicated non-linear function of q, which depends on some reference priors the choice of which has not been discussed. We believe that these estimation problems have limited the widespread of the above estimation method. In this section we assume that the transition covariance matrix Vt depends on time t and so we define the time-varying signal noise ratio as qt Ip = Vt Σ−1 , where it is assumed 8
that the variance Σ is strictly positive definite. Then in the transition equation (2.11b) we can write ωt |Σ ∼ Np×1 (0, qt Σ). This may be seen as a restriction of the model, although we should note that every conjugate state space model (univariate or multivariate), which is able to model the observation variance or covariance matrix uses this setting for Vt . West and Harrison (1997, p. 109) state that no practical problem arises, if the transition covariance matrix is scaled by the observation covariance matrix. The next result shows that local level models can be seen as special cases of DWR. Theorem 2. Consider the DWR of equations (2.1a) and (2.1b) and suppose that Ft = F = [1 0 · · · 0]0 , for all t ≥ 1. Then the DWR reduces to the local level model of equations (2.11a) and (2.11b) with Vt = qt Σ, where qt > 0 is an appropriate scalar and Σ the p × p covariance matrix of the observation innovations νt . In addition, let the prior distribution of ψ0 be ψ0 |Σ = S0∗ ∼ Np×1 (m∗0 , C0∗ S0∗ ), for some known m∗0 , C0∗ and S0∗ . Then, conditional on Σ = St∗ , the posterior distribution of ψt is ψt |Σ = St∗ , y t ∼ Np×1 (m∗t , Ct∗ St∗ ), with m∗t = m∗t−1 + A∗t e∗t ,
A∗t =
∗ Ct−1 , ∗ δ + Ct−1
Ct∗ =
1 , ∗ δ + Ct−1
∗ nt St∗ = nt−1 St−1 + rt∗ e∗t ,
where e∗t = yt − m∗t−1 , rt∗ = yt − m∗t , nt = nt−1 + 1 and 0 < δ ≤ 1. For any positive integer k, the k-step predictive distribution follows from Triantafyllopoulos and Pikoulas (2002) and from Theorem 2 as yt+k |Σ = St∗ , y t ∼ Np×1 {yt (k), Rt (k)St∗ }, where yt (k) = m∗t ,
Rt (k) = 9
k(1 − δ) + δ ∗ Ct + 1. δ
We note that yt (k) does not depend on the lead time k, sometimes referred to as forecast horizon. This fact has been well known and it is characteristic in local level models. However, the covariance matrix Rt (k) does depend on k; in fact Rt (k) is an increasing sequence in k. This is expected, since the bigger the forecast horizon the larger the variance. We also note that in the static model, where δ = 1, we have Vt = 0 and Rt (k) = Ct∗ (all uncertainty comes from Σ), while in the other extreme (totally unstable model) where δ = 0, we have Vt = Rt (k) = ∞. Of course both these cases are to be avoided. In practice discount factors in the range [0.1, 0.99] should be used.
3
The London metal exchange data
3.1
Description of the LME data
The London metal exchange (LME) is the world’s premier non-ferrous metals market, with highly liquid contracts. Its trading customers may be metal industries or individuals (sellers or buyers). LMEX, the London metal exchange index, is a base metals index comprising the six primary non-ferrous metals traded on the exchange: aluminium, copper grade A, standard lead, primary nickel, tin, and zinc. More details about the LME may be found on its web site: http://www.lme.co.uk. Here we examine official prices for the aluminum, which is the most important metal traded in the exchange. There are two categories of prices which appear in the market: the ask price and the bid price. The ask price refers to the price which occurs from the relevant amount of aluminium required by each customer/trading company. The bid price refers to the price that the exchange suggests to the customer companies and its evaluation is based on complicated models, which are not widely available. After consultation with Aluminium of Greece S.A.I.C., we found out that there was increasing interest in predicting the ask prices. Therefore, in this study we concentrate on the ask prices. The data are
10
y4
1550 1540 1500
1520
1540
1560
1480
1500
1520
y3
1560
1580
1600 1450
1500
y2
1600
1650 1400
1500
y1
1600
1700
London Metal Exchange Alluminium Closing Prices
0
50
100
Time
150
200
250
Figure 1: London metal exchange aluminium closing prices yt = [y1 y2 y3 y4 ]0
11
collected for every trading day from March 2000 to February 2001, and are plotted in Figure 1. After excluding week-ends and bank holidays, there are T = 246 trading days. There are four variables in interest, y1t : spot ask price, y2t : 3 months future contract ask price, y3t : 15 months future contract ask price and y4t : 27 months future contract ask price, where each price is US dollars per tonne of aluminium. These four variables are summarized in the vector time series yt = [y1t y2t y3t y4t ]0 , t = 1, . . . , 246. The data have been kindly provided by Aluminium of Greece S.A.I.C., a member of the Pechiney Group (http://www.pechiney.com/). Over the last 10 years there has been a noticeable interest in modelling the LME market. Slade (1988) and Meyer (1994) discuss the problems of pricing and hedging in the nonferrous metal market. The efficiency of the LME is examined in Sephton and Cochrane (1990), Agbeyegbe (1992) and Moore and Cullen (1995). The relationship of future and spot prices and how the futures can be used to predict the spot prices are considered in Gilbert (1997) and in Heaney (2002). The important subject of volatility for the LME market is discussed by Hall (1991) and McKenzie et al. (2001). Panas (2001) suggests long memory and chaos models to assess the metal prices when nonlinear structure is evident. A good review of the London metal exchange literature can be found in Watkins and McAleer (2004). It is evident that modelling and forecasting LME data requires the adaptation of efficient multivariate time series techniques. As Figure 1 indicates the four variables of interest, yit , are cross-correlated and therefore any sound modelling system should be able to model this correlation. Although the economics literature has dealt with modelling issues in the non-ferrous metal market, in particular in the LME market, we must submit that an automatic forecasting procedure is currently unavailable. Triantafyllopoulos and Montana (2004) have used standard Markov chain Monte Carlo techniques to model the LME data, but they found that simulation was too slow and not appropriate for online forecasting. In the next sections we use the local level procedure of section 2.4 in order to 12
0.015 0.010 0.000
0.005
Density
0.020
0.025
Histograms and normal QQ plots for the difference series
−60
−40
−20
0
20
40
60
−40
−20
0
20
40
60
−20
0
20
40
−40
−20
0
x3
x2
20
40
x4
40 20 0 −20 −40
Sample Quantiles
60
x1
−3
−2
−1
0
1
2
3 −3
−2
−1
0
1
2
3 −3
−2
−1
0
1
2
3 −3
−2
−1
0
1
2
3
Theoretical Quantiles
Figure 2: Histogram and normal QQ plots for the difference series xt
effectively model the LME data.
3.2
The model
In efficient markets (see, for instance, Sephton and Cochrane, 1990) a common practice for commodity price short-term forecasting is to take as one-step forecast for time t just the current observed value at time t − 1 (t = 2, . . . , 246). However, possible shocks can be identified by modelling the difference yt − yt−1 instead of the actual series yt , and we follow this approach here. We define the difference series xt = yt − yt−1 for 2 ≤ t ≤ 246. Given yt−1 , the difference xt = yt − yt−1 is expected to have zero mean (yt is predicted as yt−1 ) and some unknown variance. Histograms and normal probability plots for the 4−dimensional time series {xt } are shown in Figure 2. Of course these are only marginal 13
histograms which do not show the sample cross correlation between the four time series. However, such summary plots can validate the assumption of a normal distribution for the first difference. A more detailed analysis involving multivariate normal distribution tests can be done following the comprehensive review of Mecklin and Mundfrom (2004). Here, we assume that given a state ψt and a variance matrix Σ, we have xt |ψt , Σ ∼ N4×1 (ψt , Σ), where N4 (·) denotes the 4-dimensional multivariate normal distribution. The level of the series ψt may be considered constant, in which case we have a static regression model with ψ1 = ψ2 = · · · = ψ246 = ψ. It seems more reasonable to assume that ψt is expected to be static, but there is some uncertainty associated with it. This reasoning leads to the adaptation of a local level model (2.11a) and (2.11b) for the time series {xt }, where p = 4.
3.3
Statistical analysis
We apply Theorem 2 to provide one-step forecasts for the difference time series {xt }, xt = yt − yt−1 , where {yt } is the LME time series of Figure 1. We are also interested in estimating the covariance matrix Σ and hence looking at the cross-correlation structure of the four component time series {x1t , x2t , x3t , x4t }. Figure 3 plots the one-step forecasts xi,t−1 (1) of xit against the actual values xit , for i = 1 and i = 4 (spot and 27 months contract component time series). We observe that the forecasts are generally good except from some outliers. These outliers can be identified by a simple time series plot of the standard forecast errors, shown in Figure 4. The standard errors are plotted against ±1.96 confidence intervals identifying 19 outliers for {x1t }, 15 outliers for {x2t }, 11 outliers for {x3t } and 8 outliers for {x4t }. Out of 246 total observations these figures give 7.7%, 6.1%, 4.5% and 3.2% outliers respectively. For such kind of data we believe that the above proportions are relatively low. However, as the forecasts are produced further analysis 14
−40
0 20
60
Forecasts of the spot difference time series
0
50
100
150
200
250
time
−40 −20
0
20
Forecasts of the 27 months difference time series
0
50
100
150
200
250
time
Figure 3: One-step forecasts for the difference spot and 27 months contract series {x1t } and {x4t }, where xt = [x1t x2t x3t x4t ]0 . The solid line shows the actual values, while the dashed line shows the forecasts.
can be done, i.e. the approach of Salvador and Gargallo (2004) for automatic intervention can be applied. We also note that, comparing the forecasts of the four series, the future contract series are predicted better with {x4t } having the best forecast accuracy, while the spot time series {x1t } has the less forecast accuracy. This is expected since the spot variable is more volatile than the future contracts and this can be verified by looking at the range of the four variables. For these results, priors in Theorem 2 have specified as: m∗0 = [0 0 0 0]0 , C0∗ = 1/1000, S0∗ = 1000I4 , n0 = 1/1000 and δ = 0.7. These priors have been chosen with care in order
15
−6
−2 0 2 4
Standard errors of the spot difference time series
0
50
100
150
200
250
time
−4 −2 0
2
4
Standard errors of the 27 months difference time series
0
50
100
150
200
250
time
Figure 4: One-step forecast standard errors for the difference series {x1t } and {x4t }, where xt = [x1t x2t x3t x4t ]0 . The solid lines shows the standard errors and the dashed line plots 95% confidence intervals at ±1.96.
16
to provide a non-informative Bayesian prior specification. This is reflected by the high values of the covariance matrix S0∗ and the low value of the degrees of freedom n0 . In order to judge and evaluate the correlation structure throughout the series we need to obtain good estimates of Σ. Running several simulations we have found that the best performance of the estimate of Σ, St∗ , is obtained when n0 S0∗ = Ip and in this case n0 S0∗ = I4 . The prior C0∗ has been chosen to be very low, because again, we wish the covariance matrix of ψ0 |Σ = S0∗ to be equal to I4 . The prior m∗0 is set to zero, because the time series {xt } fluctuates around zero and a relatively low value for the discount factor δ is chosen so that the shocks could be captured. A better forecast accuracy can be achieved if δ is chosen to be even lower, i.e. δ = 0.6. A sensitivity analysis for δ is given in section 3.4. As it is mentioned before, it is of particular modelling interest to estimate the covariance matrix Σ efficiently. The respective correlation matrix will reflect upon the variance throughout each of the series {xit } as well as the correlation among the four component time series {xit }. Figure 5 plots the diagonal elements of the covariance matrix St∗ produced by Theorem 2. We observe that the variance estimate for the spot and 3 months contract time series are more volatile, while the remaining future contract time series are less volatile. Figure 6 shows the cross correlation between the four scalar time series xit . If CR = {ρ(ij) }, is the correlation matrix of xt given ψt , then the solid line shows the estimate of ρ(12) , the dashed line shows the estimate of ρ(23) , the dotted line shows the estimate of ρ(34) and the dashed/dotted line shows the estimate of ρ(14) . Of course it is ρ(ii) = 1, for i = 1, 2, 3, 4. We observe that the spot and the 3 months contract time series are much more correlated than the other future contracts. We also observe that the correlation of the spot and the 27 months contract time series is much smaller than the relevant correlation of the spot and the 3 months contract time series.
17
100 0
50
variance
150
200
Observation variance estimates
0
50
100
150
200
250
time
Figure 5: Variance estimates of Σ = {σ (ij) } (i, j = 1, 2, 3, 4). The solid line shows the estimate of σ (11) ; the dashed line shows the estimate of σ (22) ; the dotted line shows the estimate of σ (33) ; and the dashed/dotted line shows the estimate of σ (44) .
18
0.8 0.6
0.7
correlation
0.9
1.0
Observation correlation estimates
0
50
100
150
200
250
time
Figure 6: Correlation estimates of the series xt given ψt . If CR = {ρ(ij) } (i, j = 1, 2, 3, 4), is the correlation matrix of xt (corresponding correlation matrix to the covariance matrix Σ), then the solid line shows the estimate of ρ(12) , the dashed line shows the estimate of ρ(23) , the dotted line shows the estimate of ρ(34) and the dashed/dotted line shows the estimate of ρ(14) .
19
0
2
4
variance
6
8
10
Estimation of the state variance
0
50
100
150
200
250
time
Figure 7: The effect of the discount factor δ in the estimated state scalar variance Ct∗ . The solid line shows Ct∗ for δ = 0.1 and the dashed line shows Ct∗ for δ = 0.7.
3.4
Sensitivity analysis of the discount factor δ
The design of a multivariate forecasting model, as the one used above, requires the specification of the starting values m∗0 , n0 , C0∗ , S0∗ and δ. Specification of m∗0 , n0 , C0∗ and S0∗ follows from the discussion of section 3.3, but the sensitivity of δ is of particular interest. Many authors have argued (e.g. West and Harrison, 1997) that in local level models and generally in state space models the discount factor δ should be in the range [0.8, 0.99] because otherwise the system variance Wt will have very large values yielding an unstable model with very large values in the estimated state covariance matrix. This argument may be generally valid, however, we 20
Table 1: Effect of the discount factor δ in the sum of squared forecast errors (SSE). SSE δ
x1t
x2t
0.1
393.6
341.7 249.8
209.0
0.2
424.3
320.2 255.6
223.8
0.3
437.3
310.6 257.4
228.3
0.4
444.5
306.5 258.3
230.4
0.5
449.1
304.8 258.7
231.6
0.6
452.3
304.2 258.8
232.3
0.7
454.5
304.2 258.9
232.7
0.8
456.1
304.6 258.8
233.0
0.9
457.2
305.1 258.7
233.1
0.99 458.0 305.5
x3t
258.6
x4t
233.2
note that a high discount factor will not allow us to predict accurately the shocks in time series like in section 3. The estimated covariance matrix of ψt is Ct∗ St∗ , where Ct∗ and St∗ ∗ are calculated from Theorem 2. From Ct∗ = 1/(δ + Ct−1 ) it is clear that the real sequence
{Ct∗ } is neither increasing nor decreasing, but its limit exists and it is trivial to see that √ it is C ∗ = limt→∞ Ct∗ = ( δ 2 + 4 − δ)/2. Figure 7 plots the variance Ct∗ for δ = 0.1 and δ = 0.7. We see that for different values of δ the difference in Ct∗ is not very big with limiting values C ∗ = 0.95 (δ = 0.1) and C ∗ = 0.71 (δ = 0.7). This means that δ does not have a dramatic effect in the covariance matrix of ψt and this is an advantage since it gives more flexibility to the modeller. In order to judge better the effect of δ to the forecasts we have looked at the change in the sum of squared forecast errors (SSE) for several values of δ throughout the range [0.1, 0.99]. Table 1 shows that small values of δ give smaller SSE for the variables x1t , x3t , x4t , 21
but for x2t the lowest SSE is achieved for δ = 0.6 and δ = 0.7. This is why we have used in section 3.3 δ = 0.7, which we consider low enough to give good forecast accuracy. Although Table 1 suggests that δ = 0.1 gives the smallest SSE for x1t , x3t , x4t , we noticed slight forecast improvement using δ = 0.1 as far as the outlier proportion is concerned. Table 1 also suggests that perhaps two different discount factors should be used, i.e. δ1 = 0.1 for the time series x1t , x3t , x4t and δ2 = 0.7 for x4t .
4
Conclusions
This paper proposes the application of discount weighted regression (DWR) of Triantafyllopoulos and Pikoulas (2002) in order to model multivariate time series. DWR models are appropriately modified to accommodate for multivariate local level models. The noise covariance matrices are left unspecified and estimated with a maximum likelihood procedure, while the overall model exhibits maximum likelihood and weighted least squares optimalities. The proposed methodology is applied to London metal exchange time series data. Estimation and forecasting are performed with a fast and efficient online algorithm and the cross-correlation of the component series is discussed in some detail. The starting values of the algorithm as well as the discount factor are discussed giving practical guidelines which can be generally applied. A simple sensitivity analysis for the discount factor, based on the sum of squared forecast errors, has given inspirations on employing several discount factors in the models and it is expected that future research will be devoted in this direction.
Acknowledgements We thank Spiros Ikonomakos and Angelique Papageorgiou from the Commercial Department of Aluminum of Greece S.A.I.C. for providing the LME data. Special thanks are due
22
to Giovanni Montana who helped on the computational part of the paper. The work of the author was supported by grant NAL/00642/G of the Nuffield Foundation.
5
Appendix
+ Proof of Theorem 1. The condition (2.10) implies that both linear systems Xt−1 X = Xt+ + (in X) and Xt+ X = Xt−1 (in X) are consistent and so (Harville, 1997, p. 120) we have + + + Xt−1 Xt−1 Xt+ = Xt+ and Xt+ Xt Xt−1 = Xt−1 . Now see that + + + Xt−1 − δXt+ = Xt+ (Xt − δXt−1 )Xt−1 = Xt+ Ft Ft0 Xt−1 .
(5.1)
Then using mt = Xt+ Ht we get + + mt − mt−1 = Xt+ Ht − Xt−1 Ht−1 = Xt+ (δHt−1 + Ft yt0 ) − Xt−1 Ht−1 + + = (δXt+ − Xt−1 )Ht−1 + Xt+ Ft yt0 = Xt+ Ft yt0 − Xt+ Ft Ft0 Xt−1 Ht−1 + = Xt+ Ft (yt0 − Ft0 Xt−1 Ht−1 ) = Xt+ Ft (yt0 − Ft0 mt−1 ) = At e0t .
From equation (5.1), the covariance matrix Ct = Xt+ is Ct−1 = Ct (δId + Ft Ft0 Ct−1 ) ⇒ Ct Ft = (δ + Ft0 Ct−1 Ft )−1 Ct−1 Ft ⇒ µ ¶ 1 Ct−1 Ft Ft0 0 −1 0 Ct−1 − δCt = (δ + Ft Ct−1 Ft ) Ct−1 Ft Ft Ct−1 ⇒ Ct = Id − Ct−1 δ δ + Ft0 Ct−1 Ft and the proof is complete. Proof of Theorem 2. With Ft = F = [1 0 · · · 0]0 and the definition of Xt , we have 1 − δt Xt = diag(1, 0, . . . , 0) 1−δ and so we have rank(Xt ) = rank(Xt−1 ) = 1. The Moore-Penrose inverse of Xt is Xt+ =
1−δ diag(1, 0, . . . , 0) 1 − δt
23
+ and the row space of the extended matrix [Xt−1 : Xt+ ] is · ¸0 1−δ 1−δ + + R([Xt−1 : Xt ]) = λ 0 ··· 0 0 ··· 0 , 1 − δ t−1 1 − δt
for any λ ∈ R. Hence + + : Xt+ ])} = 1 = rank(Xt ) = rank(Xt−1 ) : Xt+ ]) = dim{R([Xt−1 rank([Xt−1
and so assumption (2.10) holds. Thus from Theorem 1, the estimates mt , Ct and St are updated as in equations (2.3), (2.5) and (2.4), respectively. With the transformation ψt = Θ0t F
and ωt = Ω0t F,
model (2.1a) and (2.1b) reduces to the local level model (2.11a) and (2.11b) with Ωt = qt Σ and qt = F 0 Wt F . With mt , Ct and St as in the equations (2.3), (2.5) and (2.4), define m∗t = m0t F , A∗t = F 0 At and Ct∗ St∗ = var(ψt |Σ = St∗ , y t ). Then we have et = yt − m0t−1 F = yt − m∗t−1 = e∗t ,
rt = yt − m0t F = yt − m∗t = rt∗
mt = mt−1 + At e0t =⇒ m∗t = m∗t−1 + A∗t e∗t . Since e∗t = et and rt∗ = rt the estimate St∗ = St with St as in equation (2.4) providing the required updating recursion for St∗ . Also var(ψt |Σ = St∗ , y t ) = var{vec(F 0 Θt )|Σ = St , y t } = (Id ⊗ F 0 )var{vec(Θt )|Σ = St , y t }(Id ⊗ F ) = F 0 Ct F St∗ = Ct∗ St∗ , where Ct is updated as in equation (2.5). From Ct∗ = F 0 Ct F , it follows that A∗t = F 0 At =
∗ Ct−1 F 0 Ct−1 F = . ∗ F 0 Ct−1 F + δ δ + Ct−1
24
By multiplying left and right equation (2.5) by F 0 and F , respectively, we obtain µ ¶ ∗ Ct−1 1 1 ∗ ∗ Ct−1 = . Ct = 1− ∗ ∗ δ δ + Ct−1 δ + Ct−1 The proof is complete by stating the prior ψ0 |Σ = S0∗ ∼ Np×1 (m∗0 , C0∗ S0∗ ), where m∗0 = m00 F .
References [1] Agbeyegbe T.D. (1992). Common stochastic trends: evidence from the London Metal Exchange. Bulletin of Economic Research, 44, 141-151. [2] Doucet, A., de Freitas, N. and Gordon, N.J. (2001) Sequential Monte Carlo Methods in Practice. Springer-Verlag, New York. [3] Durbin, J. and Koopman, S.J. (2001) Time Series Analysis by State Space Methods. Oxford University Press, Oxford. [4] Enns, P.G., Machack, J.A., Spivey, W.A. and Wrobleski, W.J. (1982) Forecasting applications of an adaptive multiple exponential smoothing model. Management Science, 28, 1035-1044. [5] Franco, G.C. and Souza, R.C. (2002) A comparison of methods for bootstrapping in the local level model. Journal of Forecasting, 21, 27-38. [6] Gilbert C.L. (1997). Manipulation of metals futures: lessons from Sumitomo. Centre for Economic Policy Research, 26, Discussion Paper: 1537. [7] Gupta, A.K. and Nagar, D.K. (1999) Matrix Variate Distributions. Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics 104, New York. 25
[8] Hall S.G. (1991). An Application of the stochastic GARCH-in-mean model to risk premia in the London Metal Exchange. Manchester School of Economic and Social Studies (Supplement), 59, 57-71. [9] Hannan, E.J. (1970) Multiple Time Series. New York: Wiley. [10] Harrison, P.J. and Stevens, S. (1976) Bayesian forecasting (with discussion). Journal of the Royal Statistical Society (Series B), 38, 205-247. [11] Harvey, A.C. (1986) Analysis and generalisation of a multivariate exponential smoothing model. Management Science, 32, 374-380. [12] Harvey, A.C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge. [13] Heaney R. (2002). Does knowledge of the cost model improve commodity futures price forecasting ability? A case study using the London Metal Exchange lead contract. International Journal of Forecasting, 18, 45-65. [14] L¨ utkepohl, H. (1993) Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin. [15] Machak, J.A., Spivey, W.A. and Wrobleski, W.J. (1983) Analyzing permanent and transitory influences in multiple time series models. Journal of Business and Economic Statistics, 1, 57-65. [16] McKenzie M. Michell H. Brooks R.D. and Faff R.W. (2001) Power ARCH modelling of commodity futures data on the London Metal Exchange. European Journal of Finance, 7, 22-38. [17] Mecklin, C.J. and Mundfrom, D.J. (2004) An appraisal and bibliography of tests for multivariate normality. International Statistical Review, 72, 123-138. 26
[18] Meyer T.O. (1994). The difficulty in cross-hedging London Metal Exchange spot-price risk using U.S. metal and British pound futures. Journal of Multinational Financial Management, 4, 141-153. [19] Moore M.J. and Cullen U. (1995). Speculative efficiency on the London Metal Exchange. Manchester School of Economic and Social Studies, 63, 235-256. [20] Panas E. (2001). Long memory and chaotic models of prices on the London Metal Exchange. Resources Policy, 27, 235-246. [21] Quintana, J.M. and West, M. (1987). An analysis of international exchange rates using multivariate DLMs. The Statistician, 36, 275-281. [22] Salvador, M. and Gargallo, P. (2004) Automatic monitoring and intervention in multivariate dynamic linear models. Computational Statistics and Data Analysis, 47, 401-431. [23] Salvador, M., Gallizo, J.L. and Gargallo, P. (2003) A dynamic principal components analysis based on multivariate matrix normal dynamic linear models. Journal of Forecasting, 22, 457-478. [24] Sephton P.S. and Cochrane D.K. (1990). A note on the efficiency of the London Metal Exchange. Economics Letters, 33, 341-345. [25] Shephard, N. (1993) Distribution of the ML estimator of an MA(1) and a local level model. Econometric Theory, 9, 377-401. [26] Shephard, N. and Harvey, A.C. (1990) On the probability of estimating a deterministic component in the local level model. Journal of Time Series Analysis, 11, 339-347. [27] Slade M.E. (1988). Pricing of Metals. CRS Monograph series 22, Queens University Centre for Resource Studies, Kingston Ontario.
27
[28] Triantafyllopoulos, K. and Montana, G. (2004) Forecasting the London metal exchange with a dynamic model. In J. Antoch (ed.) Proceedings in Computational Statistics 2004, Physica-Verlag, Heidelberg, 1885-1892. [29] Triantafyllopoulos, K. and Pikoulas, J. (2002) Multivariate regression applied to the problem of network security. Journal of Forecasting, 21, 579-594. [30] Watkins, C. and McAleer, M. (2004) Econometric modelling of non-ferrous metal prices. Journal of Economic Surveys (to appear). [31] West, M. and Harrison, P.J. (1997) Bayesian Forecasting and Dynamic Models. 2nd edition. Springer Verlag, New York. [32] Whittle, P. (1963) Prediction and Regulation. English Universities Press, London.
28