Computational Statistics & Data Analysis 52 (2008) 1269 – 1280 www.elsevier.com/locate/csda
Forecasting time series using principal component analysis with respect to instrumental variables P.-A. Cornillona , W. Imamb , E. Matzner-LZbera,∗ a Équipe de Statistique, IRMAR UMR 6625, Université Rennes 2, Av. G. Berger CS24307, Haute Bretagne,
35043 Rennes, France b Higher Institute for Demographic Studies and Researches, Damascus, Syria
Received 16 September 2005; received in revised form 31 January 2007; accepted 11 June 2007 Available online 29 June 2007
Abstract Two new forecasting methods of time series are introduced. They are both based on a factorial analysis method called spline principal component analysis with respect to instrumental variables (spline PCAIV). The first method is a straightforward application of spline PCAIV while the second one is an adaptation of spline PCAIV. In the modified version, the used criteria according to the unknown value that need to be predicted are differentiated. Those two forecasting methods are shown to be well adapted to time series. © 2007 Elsevier B.V. All rights reserved. Keywords: Additive spline; Forecasting; PCAIV; Time series
1. Introduction Forecasting future observations is probably the most important task of time series analysis and this fascinates statisticians and economists. Among recent papers, see for instance Heij et al. (2007), Koopman and Ooms (2006). The purpose of this paper is to introduce two new forecasting methods, both based on a factorial analysis method called spline principal component analysis with respect to instrumental variables (spline PCAIV). Spline PCAIV is a multivariate method introduced by Durand (1993). A classical approach for forecasting time series is postulating parametric models, estimating a few coefficients and computing the forecasts. Let {Zt }1 t T denote a univariate time series process, the most popular model is the autoregressive model of order p Zt =
p
i Zt−i + t ,
i=1
∗ Corresponding author.
E-mail address:
[email protected] (E. Matzner-LZber). 0167-9473/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2007.06.017
(1)
1270
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
which is a particular model of the well-known autoregressive moving average (ARMA) models. Model (1) represents the current state of Zt through its immediate p past values. This is done in a linear form where the intercept is excluded. We suppose E(Zt ) = 0 and, in practice the sample mean from the data is subtracted before fitting. The limitation of that type of linear models is well known and infinitely many forms of nonlinear models have been explored since the eighties. The development in nonparametric regression provides techniques for modelling time series through the following model: Zt = g(Zt−1 , . . . , Zt−p ) + t . However, when p is large, the nonparametric approach suffers from the “curse of dimensionality” and a natural simplification is the nonlinear additive autoregressive models Zt = f1 (Zt−1 ) + · · · + fp (Zt−p ) + t .
(2)
Additive models are very useful for approximating the high-dimensional autoregressive function g(.) given above. They and their extensions have become one of the widely used nonparametric techniques. The functions fi could be estimated by several nonparametric methods: spline, kernel. . . . An excellent survey is given in Fan and Yao (2003). Note, however, that one can add a constant to a component and subtract that constant from another, thus the functions f1 , . . . , fp are not identifiable. To prevent ambiguity, usual conditions are imposed E(fi (Z)) = 0,
i = 1, . . . , p.
with this constraints, E(Zt ) = 0. Therefore, we assume that the mean has already been removed from the series or we add an intercept to the model. Recall that following the seminal paper of Rao (1964), the goal of linear principal component analysis with respect to instrumental variables (PCAIV) is to explain a few response Y1 , . . . , Yq by a linear model based on explanatory variables X1 , . . . , Xp . The key idea is to use the same linear combinations of explanatory variables for each response variables. These linear combinations are chosen to be 2 × 2 orthogonal and to be ordered from the most explanatory to the least one. This classical presentation can be viewed as PCA of the projection of Y on the vector space spanned by X1 , . . . , Xp . Another classical presentation of PCAIV, which leads to the same linear combination of explanatory variables, is given in Section 2.1. This definition of PCAIV is used by Durand (1993) to propose a nonlinear PCAIV using B-splines. A presentation of this nonlinear multivariate tool is given in Section 2.2. In 1999, Imam et al. used spline PCAIV to estimate the order p of a nonlinear autoregressive model and the functions fi in model (2). Following their work, two methods derived from spline PCAIV are proposed to forecast time series. The first one is classical, the functions fi are estimated then the forecasts are computed. The second one is completely new, as we optimize a criteria according to the unknown value we want to forecast. Section 3 is devoted to the presentation of these forecasting methods. In Section 4, the forecasting methods are applied to two well-known real time series and the results of the forecast are compared with other classical forecasting methods. 2. Presentation of spline PCAIV 2.1. Linear PCAIV The purpose of PCAIV is to analyze linear relations between two sets of variables X and Y. Let X be a n × p matrix, representing p explanatory variables X1 , . . . , Xp measured on the same n individuals. Let Y be a n × q matrix, representing q response variables Y1 , . . . , Yq measured on the same n objects. The approach proposed by Escoufier (1987) introduces metrics to compute distances between, respectively, objects and variables. The variable space Rn is equipped with the metric D = diag(. . . , di , . . .) corresponding to the individual weight, in general D = n1 In . The individual space corresponding to Y (respectively, to X) is Rq (resp. Rp ) and is equipped with the metric Q (resp. R which is unknown). A first triple (Y, Q, D) has to be explained by a second one (X, R, D). From now on, we consider that D = n1 In and Q = Iq . Let us summarize the procedure in its most classical presentation.
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
1271
Definition 1. Let (Y, Iq , n1 In ) be a triple to be explained and (X, R, n1 In ) an explanatory triple with unknown R. PCAIV consists of two steps: (1) Find a metric Rˆ on the objects space of X such as the discrepancy between Y Y and XRX is minimum according to the Hilbert Schmidt norm, norm on the symmetric matrix space. Rˆ = arg min Y Y − XRX 2 . R
ˆ 1 In ). (2) Perform principal component analysis (PCA) of the triple (X, R, n A formal solution of this minimization problem is given by Rˆ = [X X]+ X Y Y X[X X]+ , where A+ denotes the Moore–Penrose generalized inverse, see Escoufier (1987) for further details. Remark. Linear PCAIV leads to the same representation of observations as the PCA of (PX Y, Iq , n1 In ) where PX is the orthogonal projector on the linear subspace of Rn spanned by the columns of X. 2.2. Spline PCAIV
spline PCAIV: the inputs are transformed by (learned) splines; not the outputs
The idea of spline PCAIV (Durand, 1993) is PCAIV of Y with respect to X(s) where X(s) is a transformation of X via B-splines. For each explanatory variable Xj , the user defines the degree of the spline and the position and number of knots leading to rj basis functions. For sake of simplicity, rj are chosen to be equal to r for j = 1, . . . , p. In practice, as in Durand (1993), we use three equally space knots and degree 2 spline. As these p basis are evaluated on j the design, each element xij of the j th explanatory variable Xj , the corresponding spline basis is written as (B1 (xij ), j j B2 (xij ), . . . , Br (xij )). The nonlinear spline transformation of Xj is given by ⎡ r ⎡ j⎤ j j⎤ sl l=1 Bl (x1j )sl ⎥ ⎥ ⎢ ⎢ j j j .. ⎥ = [B | . . . |Br ] ⎢ .. ⎥ = B j Xj (s j ) = ⎢ 1 (n×r) s(r×1) , ⎦ ⎣ ⎣ . ⎦ . r j j j sr l=1 Bl (xnj )sl B j is a coding matrix of Xj and s j is the vector of spline coefficients. In matrix notation, the transformation of X is X(n×p) (s) = B(n×pr) S(pr×p) , where
⎡
B = [B 1 | . . . |B p ]
and
s11
⎢ . ⎢ . ⎢ . ⎢ ⎢ s1 ⎢ r ⎢ ⎢ S = ⎢ ... ⎢ ⎢ ⎢0 ⎢ ⎢ . ⎢ . ⎣ . 0
0
.
.
.. .
.. .
.. .
0 .. .
. .. .
. .. .
0
⎤
0
.
.
.. .
.. .
.. .
.. ⎥ ⎥ . ⎥ ⎥ 0⎥ ⎥ ⎥ .. ⎥ . . ⎥ ⎥ p⎥ s1 ⎥ ⎥ .. ⎥ ⎥ . ⎦
0
.
.
sr
p
The matrix S is the unknown spline coefficient matrix and s is the unknown spline coefficient vector s = (s 1 , . . . , s p ) . With those notations, spline PCAIV can be defined as follows:
1272
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
Definition 2. Spline PCAIV consists of two steps: (1) Find a metric Rˆ and a transformation sˆ such as the representation of the triple to be explained (Y, Iq , n1 In ) is the closest to the representation of the explanatory triple (X(s), R, n1 In ) ˆ sˆ ) = arg min Y Y − X(s)RX (s)2 (R, (R,s)
= arg min tr{[Y Y − X(s)RX (s)]2 }. (R,s)
(3)
ˆ 1 In ). (2) Perform PCA of the triple (X(ˆs ), R, n A solution to this minimization problem (3), is obtained by setting the partial derivatives equal to zero, leading to the so-called “normal” equations. If s is fixed, an explicit solution for R is given by Rˆ = [X (s)X(s)]+ X (s)Y Y X(s)[X (s)X(s)]+ .
(4)
If R is fixed, no explicit solution for s is available, then minimization algorithms have to be used. Once s is computed, R is recomputed and so on. This iterative procedure was proposed by Durand (1993) (steepest descent on s) and modified by Imam and Durand (1997) (steepest descent on each s j alternatively). In both algorithms, the starting values s0 for s are the nodal values being the value such that X(s0 ) = X. Roughly speaking nodal values are computed as mean of knots. For an exact expression, see for example Schumaker (1981) or Durand (1993). The first step of spline PCAIV is linear PCAIV since X(s0 ) equal X. ˆ 1 In ) or the PCA of We obtain the same representation of the individuals by performing the PCA of (X(ˆs ), R, n 1 (PX(ˆs ) Y, Q, n In ) where PX(ˆs ) Y is the projection of Y on the space spanned by the columns of X(ˆs ). This is the second step of spline PCAIV and it can be written as the PCA of (Yˆspl.acpvi , Iq , n1 In ) where ˆ + Sˆ B Y . ˆ Sˆ B B S) Yˆspl.acpvi = PX(ˆs ) Y = B S(
(5)
This formula could be compared to the one obtained by spline regression on the same basis function Yˆspl.reg = PB Y = B(B B)+ B Y .
(6)
It can be proved that Yˆspl.reg = Yˆspl.acpvi (Cornillon and Matzner-LZber, 2007). As a closed form exists for (6), one could wonder why using algorithms for solving (5)? The use of such algorithms allows to circumvent the curse of dimensionality inherent to the use of B-splines. However, in the same spirit of spline PCAIV, one can think about other algorithmic procedures.
3. Spline PCAIV and time series analysis In time series context, let Y be (ZT , . . . , Zp+1 ) and let X be the p-lagged values of Zt , so Y (respectively, X) is an n × 1 (respectively, n × p) matrix. In PCAIV words, the instrumental variables are the lagged variables. As Y is a vector, there is no need of doing the second step of spline PCAIV. Expanding (5) gives yˆi =
p l=1
ˆ l
r k=1
sˆ lk Bkl (xik )
= fˆ1 (xi1 ) + · · · + fˆp (xip ).
(7)
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
1273
Applying this formula to the last p-observed values of the time series leads to the first forecasting method Zˆ T +1 = fˆ1 (ZT ) + · · · + fˆp (ZT −p+1 ).
(8)
From now on, this method is refereed to Direct. As we use nodal values as the starting point of the algorithm, the first step of spline PCAIV is the linear PCAIV and in the context of time series gives an estimation of a linear autoregressive model. Then, step by step, the estimations become more and more nonlinear but still in an additive form. Let us imagine that the value ZT +1 is available (which it is not in reality), Y and X would have n + 1 lines: ⎛
ZT +1
⎜ Z ⎜ T Y =⎜ ⎜ .. ⎝ .
⎞
⎛ Z T ⎜ ZT −1 ⎜ =⎜ X ⎜ .. ⎝ .
⎟ ⎟ ⎟, ⎟ ⎠
Zp+1
Zp
... .. .
ZT −p+1 ⎞ ZT −p ⎟ ⎟ ⎟. ⎟ .. ⎠ .
...
Z1
with respect to X to obtain estimates of (2) based on T − p + 1 We could conduct a spline PCAIV analysis of Y observations instead of T − p. Unfortunately ZT +1 is unknown, but the spline PCAIV criterion could be written as T +1 ) = arg ˆ sˆ , Z (R,
min
(R,s,ZT +1 )
Y − X(s)R (s))(Y Y − X(s)R (s)) ) tr((Y X X
(9)
Obviously the minimization is conducted over (R, s) and also the unknown value ZT +1 . The partial derivative of (9) with respect to R gives the expression of the optimal metric, being a function of both ZT +1 and s + + (s)X(s)) Y X(s))( (s)X(s)) X (X (s)Y . R = (X
Deriving the objective function (Eq. (9)) with respect to the spline coefficient s, as in spline PCAIV, does not lead to a separable expression for s, thus the need of using an iterative algorithm. Finally setting the partial derivative of (9) T +1 equal to 0 leads to an exact formula for Z T +1 . with respect to Z Proposition 1. The minimal value of the objective function Y − X(s)R (s)2 , g(R, s, ZT +1 ) = Y X when (R, s) remain fixed, this is equal to one real root of the polynomial of degree 3 p(y) = y 3 + (Y Y − a11 )y − Y A12 , where a11 ∈ R and A12 ∈ RT −p are such as:
= A = X(s)R X
a11
A12
A12
A22
.
(10)
1274
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
Using this proposition we can define the forecasting PCAIV denoted Fpcaiv via the following algorithm: Proposition 2. Indirect spline PCAIV forecast. Forecasting PCAIV (0)
ZT +1 ⇐ median(Z1 , . . . , ZT ) (0) ) = X s (0) such that X(s i⇐1 (0) g(R (0) , s (0) , ZT +1 ) ⇐ −∞ (1) g(R (1) , s (1) , ZT +1 ) ⇐ 0 (i−1) (i−1) while |g(R (i) , s (i) , ZT +1 ) − g(R (i−1) , s (i−1) , ZT +1 )| > or i < N do + (i) (i−1) (i−1) (i−1) (i−1) X(s (i−1) )) (s (s (i−1) Y R ⇐ (X )X(s )) (X )Y (s (i−1) )X(s (i−1) ))+ (X (i) ZT +1 ⇐ y ∈ R |p(y) = 0 (see proposition 1 ) jg s (i) ⇐ s (i−1) − (s (i−1) ) (gradient descent see Durand (1993) or Imam and Durand (1997)) js . (i) (i) ⇐ (Y ..Z Y ) T +1 (i−1)
(i) − X(s (s (i) )2 (i) )R (i) X (i) Y g(R (i) , s (i) , ZT +1 ) ⇐ Y i ⇐i+1 end do
Remark. If at stage (i) three real roots are found for the polynomial p(y) = y 3 + (Y Y − a11 )y − Y A12 , it suffices to (i−1) choose the one which is closer to ZT +1 . But usually because n is large, there is only one real root as shown hereafter. The polynomial p(y) has only one real root if (Y Y −a11 ) is positive. Remember that spline PCAIV tries to approximate the scalar product Y Y by X(s)RX(s) . Thus a11 is close to the squared norm of YT and Y Y is the sum of n terms of the same order as a11 that is a quantity of order n. The probability of Y Y − a11 in order to be positive increases with n and also with the stage (i). 4. Examples In the following section, those methods are applied to two real time series, both classical but with different features. The first one is short and regular and the second one have much more observations, is irregular and have explanatory variables. In the two following examples, we use three equally space knots and degree 2 spline as done in Durand (1993). We use time series directly without subtracting the sample mean. As presented in the introduction our model includes an intercept. 4.1. Choice of the model As shown in Imam et al. (1999), spline PCAIV is used to estimate the order p of the nonlinear autoregressive model by proceeding as follows: (1) (2) (3) (4)
Choose P big enough to incorporate possible seasonal component. Build a n × 1 matrix Y and a n × P matrix X. Perform a spline PCAIV. Compute the residual variance of order j, for all j less than P ˆ 21,...,j =
T 1 (Zt − fˆ1 (Zt−1 ) − · · · − fˆj (Zt−j ))2 . n−1 t=P +1
(5) Plot ˆ 21,...,j against j and choose p as the value for which ˆ 21,...,j j > p stabilizes.
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
1275
4 3.5 3 2.5 2 1.5
0
20
40
60
80
100
120
Fig. 1. Time plot of the log10 lynx data (114 observations).
At that point, the last p lagged values have been selected and we have two matrices: a n × 1 matrix Y and a n × p matrix X. If the aim is to select the most explicative variable in order to get a smaller model, do (1) Compute the residual variance for the p lagged variables ˆ 2j =
T 1 (Zt − fˆj (Zt−j ))2 n−1 t=p+1
and choose the most explicative variable denoted i1 . (2) Compute then ˆ 2i1 ,j =
T 1 (Zt − fˆi1 (Zt−i1 ) − fˆj (Zt−j ))2 . n−1 t=p+1
(3) Stop when the estimated variance stabilizes with the selected variables i1 , . . . with a maximum number of variable being pˆ p.
4.2. Canadian lynx data We study the classic Canadian lynx data consisting of the annual record of the numbers of Canadian lynx trapped in the Mackenzie River district of North west Canada for the period 1821–1934. This time series has attained the status of “benchmark data set”. Tong (1990, chapter 7), exposed in great details the historical background of the series and then presenting a complete analysis of the data. Recently, Lin and Pourahmadi (1998) confine their attention to the class of additive and projection pursuit regression model and rely on the estimated prediction error variance to compare the prediction performance of various (non-)linear models. The series consists in 114 observations. The last 14 observations are put aside and are used to evaluate the forecasting performance of the methods proposed in the previous section. As Lin and Pourahmadi (1998) we will compute one step ahead forecast and multi-step ahead forecast. We use the common log10 -transformation and denote zt this transformation. Fig. 1 gives the plot of the series. In a first step, we select P = 15 and conduct spline PCAIV. We select the two first variables assuming the model zt = f1 (zt−1 ) + f2 (zt−2 ) + t . A new spline PCAIV is conducted and estimate the two functions via Eq. (7). The estimations are plotted in Fig. 2. The relative performance of the predictors are assessed by computing s1 = (S1 /14)1/2 , where S1 is the sum of squares of the one step and multi-step prediction errors for t = 101, . . . , 114 and s2 = S2 /14, where S2 is the sum of the absolute percentage error of the one step and multi-step prediction errors for t = 101, . . . , 114. The forecasts for the lynx data set are given in Table 1. Our predictive methods fairly compare with the results obtained by Lin and Pourahmadi (1998) for their best model (PAD(1,2)). As shown in Table 1, the direct and the
1276
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
-1 5
-1.4
4
-1.8 -2.2
3 2
2.5
3
3.5
2
2.5
3
3.5
Fig. 2. Plots of functions: (a) f1 (zt−1 ) and (b) f2 (zt−2 ) fitted to the log10 lynx data using spline PCAIV.
Table 1 One-step and multi-step-ahead forecasts for the lynx data set Observed
Direct One-step
Direct Multi-step
Fpcaiv One-step
Fpcaiv Multi-step
PAD(1,2) One-step
PAD(1,2) Multi-step
Z101 = 2.36 Z102 = 2.60 Z103 = 3.05 Z104 = 3.39 Z105 = 3.55 Z106 = 3.47 Z107 = 3.19 Z108 = 2.72 Z109 = 2.69 Z110 = 2.82 Z111 = 3.00 Z112 = 3.20 Z113 = 3.42 Z114 = 3.53
2.35 2.70 2.85 3.37 3.57 3.40 3.05 2.74 2.52 2.85 3.04 3.22 3.36 3.46
2.35 2.69 2.98 3.24 3.43 3.42 3.10 2.69 2.58 2.72 2.93 3.17 3.38 3.43
2.34 2.67 2.84 3.35 3.55 3.38 3.03 2.73 2.50 2.84 3.02 3.20 3.35 3.44
2.43 2.66 2.92 3.16 3.35 3.38 3.15 2.78 2.63 2.74 2.93 3.14 3.33 3.36
2.33 2.73 2.91 3.39 3.51 3.44 3.15 2.85 2.49 2.81 3.02 3.17 3.30 3.43
2.33 2.68 3.04 3.32 3.43 3.33 3.09 2.86 2.76 2.82 2.96 3.12 3.22 3.23
s1 s2
0.091 2.27
0.084 2.45
0.098 2.35
0.114 3.02
0.093 2.47
0.125 3.13
forecasting PCAIV have both good sums of the absolute percentage error (s2 ), better than the PAD(1,2) model. For the standard error of forecast, s1 , direct PCAIV is still better than the other methods. Forecasting PCAIV is penalized by a bad forecast for the third highest value (Z106 ) and because s1 does not take into account the percentage of error but perform well for the multi-step forecasting. Finally for this short and very regular time series either direct PCAIV or forecasting PCAIV improves the quality of forecast and their implementation, as nonparametric procedures, are easy. We do not try to optimize our methods by tuning important parameters such as the number and location of knots and the degree of spline because the aim of the paper is to show that with default values (three equally spaced knots and degree 2) our methods work very well. The next example is the River Jökulsá Eystri of Iceland which has explanatory variables. 4.3. River Jökulsá Eystri of Iceland The data consist of daily riverflow (yt ), precipitation (zt ) and temperature (xt ) from January 1, 1972, to December 31, 1974. There are 1096 observations. An important hydrological feature of this river is the presence of a glacier on the drainage area. For further information, see Tong (1990, chapter 7). Using only data of the first two years and keeping the last year for out-of-sample prediction, we applied a first spline PCAIV with X being the 10 lagged valued of the
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
140 120 100
80
15
70
10
60
5
50
0
40
80
-5
30
60
20
40
10
-10 -15 -20
0
20
200
400
600
800
1000
1277
200
400
600
800
-25
1000
200
400
600
800
1000
Fig. 3. Time plots of river Jökulsá Eystri riverflow data: (a) daily riverflow yt (m3 /s); (b) daily precipitation zt (mm/day); (c) daily temperature xt (◦ C).
-10
120
25
4
-10 -30
40 40
120
4
40
120
10
0
-2 -20
10
-20
10
-20
10
40
120
0 40
120
1
3
2
-3
-1
-4
-20
10
-20
10
0
50
-20
10
Fig. 4. Plots of functions fitted to the Jökulsá Eystri riverflow data using spline PCAIV. (yt−1 ): f1 (yt−1 ) versus yt−1 , . . . , (xt−5 ): g6 (xt−5 ) versus xt−5 . Threshold value of Tong (temperature of −2◦ C) is figured by dotted vertical line.
riverflow, precipitation and temperature (Fig. 3). We select the different orders to be (4, 1, 5) and after a new spline PCAIV we select the following model: yt = f1 (yt−1 ) + f2 (yt−2 ) + f3 (yt−3 ) + f4 (yt−4 ) + g1 (zt ) + g2 (xt ) + g3 (xt−1 ) + g4 (xt−2 ) + g5 (xt−3 ) + g6 (xt−5 ) + t . The estimated functions are drawn in Fig. 4. Our model is very similar to the model of Chen and Tsay (1993) and Tong (1990) models. We refer the reader to these references for a complete discussion of hydrological features. For instance, the second line of Fig. 4 (fitted functions for temperature) indicates the presence of a threshold around xt = −2 as in the threshold model of Tong. In terms of residual variance (for the 3 years) our model (27.26) performs better than the Chen and Tsay model (35.32) and the Tong model (31.77). We calculate the out-of-sample predictive power of these two last models and our two predictive procedures, keeping the observations of the last year to perform prediction errors. We compute the mean
1278
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
Table 2 Multi-step-ahead mean square errors of forecasts for the Jökulsá data set Model’s name
Tong Chen and Tsay Direct Fpcaiv
Forecasts for the following lead times 1
2
3
4
5
6
7
66.67 65.52 58.05 57.01
153.39 142.01 123.87 126.42
196.37 175.88 157.83 160.82
231.85 203.03 183.94 185.59
258.92 230.02 215.12 209.35
283.16 256.89 259.68 245.07
300.27 279.32 304.75 286.33
square errors of forecasts of the four models. We focus our interest on one day ahead to one week ahead predictions. The results are given in Table 2. From Table 2, it is clear that our methods produce the best forecast among the four models. As in lynx data example, direct forecasting and forecasting PCAIV (Fpcaiv) leads to close forecast values. 5. Conclusion In recent years, due to availability of low-cost computing power, there has been a great proliferation of classes of nonlinear models for fitting and predicting time series. Two new forecasting methods both based on the same criterion are presented. As presented in Section 2, this criterion tries to minimize the discrepancy between two scalar products. This objective function seems to be well suited for time series, both for model building (Imam et al., 1999) and forecasting. These methods use iterative algorithms which start from linear autoregressive models by using nodal values. Then iteration by iteration, we estimate more accurately the nonlinearity of the components of the time series. The price to be paid for nonlinearity, while circumventing the curse of dimensionality is the use of an iterative algorithm. However, with nowadays computer, the computational times are small. For example, it took 10 s on a personal computer to compute the 14 forecasts of the lynx data. In the paper, we do not try to optimize our methods by tuning important parameters such as the number and location of knots and the degree of spline. Instead, we show that with default values (three equally spaced knots and degree 2) our methods work very well. By optimizing these parameters, we could improve the results but it is not the scope of the paper to fine tune parameters for each time series. To distinguish between the two procedures, direct PCAIV, to our knowledge, seems to perform better in the context of short and regular time series. On the other hand, for long-time series or irregular ones, the forecasting PCAIV allows more flexibility leading to better forecasts. Acknowledgment The authors express their gratitude to the anonymous referee and Co-Editor whose comments greatly improved this paper. Appendix A. Proof of Proposition 1 Recall that the objective function g is defined by Y − X(s)R (s))2 g(R, s, YT ) = tr(Y X Y ) +tr(X(s)R (s)) . Y Y (s))2 −2tr(Y Y X(s)R X = tr(Y X (A.1)
(A.2)
We refer to Magnus and Neudecker (1988) for differentiation with respect to a vector. Differentiation with respect of (A.1) leads to to Y Y )} = tr(dY Y ) + tr(Y Y ) + tr(Y dY ) Y Y dY Y Y dY Y ) + tr(Y Y Y Y Y d{tr(Y Y dY . Y = 4Y
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
1279
From the last equation we easily get the partial derivative of (A.1) with respect to YT , which is the first : coordinate of Y
Y ) Y Y jtr(Y Y . = 4YT Y jYT leads to: Differentiating (A.2) with respect to Y (s)) Y X(s)R dY X(s)R (s))} = − 2tr(dY (s)) − 2tr(Y Y X(s)R X X X d{−2tr(Y X(s)R (s)dY ) = − 4tr(Y X . AdY = − 4Y Recall that we are only interested in the derivative with respect to YT which is the first coordinate. Thus we apply = (10 . . . 1) to the last equation to get the derivative. Recall that we partition the matrix A: dY
A = X(s)RX(s) =
a11 A12
A12 A22
,
= (YT Y ). Using these equations we easily obtain the partial derivative of g and we can use the following partition Y with respect to YT : jg = 4YT3 + 4(Y Y − a11 )YT − 4Y A12 = p(YT ). jYT We want a real root of this degree 3 polynomial. The derivative of p is p (YT ) = 12YT2 + 4(Y Y − a11 ). If (Y Y − a11 ) is positive then p(YT ) has only one real root. Spline PCAIV tries to approximate the scalar product Y Y by X(s)RX(s) . Thus a11 is close to the squared norm of YT and is a quantity of order 1. On the other hand, Y Y is the sum of n terms of the same order as a11 , thus Y Y − a11 is positive. References Chen, R., Tsay, R.S., 1993. Nonlinear additive arx models. J. Amer. Statist. Assoc. 88, 955–967. Cornillon, P.A., Matzner-LZber, E., 2007. Relation between spline PCAIV and spline regression. in: Proceedings of the 12th ASMDA conference. Durand, J.F., 1993. Generalized principal component analysis with respect to instrumental variables via univariate spline transformations. Comput. Statist. Data Anal. 16, 423–440. Escoufier, Y., 1987. Principal components analysis with respect to instrumental variables. European Courses in Advanced Statistics, University of Napoli, pp. 285–299. Fan, J., Yao, Q., 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. Heij, C., Groenen, P.J.F., van Dicjk, D., 2007. Forecast comparison of principal component regression and principal covariate regression. Comput. Statist. Data Anal. 51, 3612–3625. Imam, W., Durand, J.F., 1997. Une extension spline additive de l’analyse en composantes principales sur variables instrumentales. in: 29e Journées de Statistique ASU, pp. 468–471. Imam, W., Matzner-LZber, E., Aifoute, A.S., 1999. Choix de l’ordre des modèles autorégressifs fonctionnels additifs. Rev. Statist. Appl. 47, 63–80. Koopman, S.J., Ooms, M., 2006. Forecasting daily time series using periodic unobserved components time series models. Comput. Statist. Data Anal. 51, 885–903.
1280
P.-A. Cornillon et al. / Computational Statistics & Data Analysis 52 (2008) 1269 – 1280
Lin, T., Pourahmadi, M., 1998. Nonparametric and non-linear models and data mining in time series: a case study in the Canadian lynx data. Appl. Statist. 47, 187–201. Magnus, J.R., Neudecker, H., 1988. Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, New York. Rao, C.R., 1964. The use and interpretation of principal component analysis in applied research. Sankhya, A 26, 329–358. Schumaker, L.L., 1981. Spline Functions: Basic Theory. Wiley, New York. Tong, H., 1990. Nonlinear Time Series: A Dynamical System Approach. Clarendon Press, Oxford.