Computational Statistics & Data Analysis 51 (2006) 1892 – 1903 www.elsevier.com/locate/csda
Detecting change-points in multidimensional stochastic processes Jan G. De Gooijer∗ Department of Quantitative Economics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands Received 11 January 2005; received in revised form 1 December 2005; accepted 8 December 2005 Available online 28 December 2005
Abstract A general test statistic for detecting change-points in multidimensional stochastic processes with unknown parameters is proposed. The test statistic is specialized to the case of detecting changes in sequences of covariance matrices. Large-sample distributional results are presented for the test statistic under the null hypothesis of no-change. The finite-sample properties of the test statistic are compared with two other test statistics proposed in the literature. Using a binary segmentation procedure, the potential of the various test statistics is investigated in a multidimensional setting both via simulations and the analysis of a real life example. In general, all test statistics become more effective as the dimension increases, avoiding the determination of too many “incorrect” change-point locations in a one-dimensional setting. © 2005 Published by Elsevier B.V. Keywords: Binary segmentation; Cramér–von Mises; Covariance matrices; Cumulative sum; Likelihood ratio
1. Introduction Procedures for detecting changes of variances in an ordered sequence of (possibly independent) observations taken from a multidimensional stochastic process can help to elucidate the structure of the process. For instance, it is well known that homogeneity of variance in a sequence of observations taken from a single financial risk factor does not necessarily imply a homogeneous behaviour in variance of all possible risk factors simultaneously. Hence, a univariate change-point detection procedure may well fail to reject the assumption of constant variance underlying the model fitted to each individual sequence of observations. The case of abrupt changes in some of the parameters of a statistical model has been an area of interest for many years. There is now a literature too large to review on detecting a change in mean level of a regression function at some (unknown) point in the sample; see, e.g., Csörg˝o and Horváth (1997) for an extensive review. Relative little attention has been given to variance changes. Interest seems to start from the Wichern et al. (1976) paper, where they offer a “moving-block” procedure for detecting step changes of variance in residuals obtained from a fitted univariate AR model. Baufays and Rasson (1985) improved on this procedure by handling several points of variance change simultaneously. Hsu (1977) looked at detecting a single change in variance at an unknown point in time in a series of independent observations. Davis (1979) studied tests for a single change in the innovations variance at a specified point in time in an AR process. Abraham and Wei (1984) used a Bayesian framework to study changes in the innovation ∗ Tel.: +31 20 525 4244; fax: +31 20 525 4349.
E-mail address:
[email protected]. 0167-9473/$ - see front matter © 2005 Published by Elsevier B.V. doi:10.1016/j.csda.2005.12.004
Jan G. De Gooijer / Computational Statistics & Data Analysis 51 (2006) 1892 – 1903
1893
variance of an ARMA process. Srivastava (1993) found a cumulative sum of squares (CUSUM) procedure to perform better than exponentially weighted moving average procedure for detecting an increase in variance in white noise series. Vostrikova (1981) and others have pointed out that a method for detecting a single change may be able to detect changes by binary segmentation (BS); see Section 4.3 for details. Inclán and Tiao (1994) adapted the BS procedure, called iterative cumulative sum of squares algorithm, to detect and locate univariate variance change-points. Their test statistic is a CUSUM of squares test for variance changes in a given sequence of independent observations taken from a univariate normal distribution. In a similar vein, Chen and Gupta (1997) used the BS procedure combined with the Schwarz information criterion (SIC). The problem of detecting change-points with observations taken from a multivariate normal distribution has been considered by Chen and Gupta (2000). Specifically, let X1 , X2 , . . . , Xn be a sequence of independent m-dimensional normal random vectors with parameters (0, 1 ), (0, 2 ) , . . . , (0, n ), respectively. These authors are interested in testing the hypothesis H0 : 1 = 2 = · · · = n = 0
(unknown)
(1)
versus the alternative H1 : 1 = · · · = k ∗ = k ∗ +1 = · · · = n ,
(2)
where m < k ∗ < n − m, k ∗ is the unknown position of the change-point. The proposed test statistic is a likelihood ratio (LR) test combined with either SIC (LR–SIC) or an unbiased version of it (LR–SICu ). In this paper, we propose a general statistic for detecting parameter constancy in multidimensional stochastic processes. The resultant general test statistic will be defined in terms of sequences of partial sums of the efficient score functions of the densities from which the multivariate observations are taken; Section 2. Next, the test statistic will be specialized to the case of testing (1) versus (2); Section 3.1. Further, we summarize the main features of the LR–SIC and LR–SICu change-point test statistics; Section 3.2. Using simulations, we compare the null distribution and empirical power of the proposed test statistic with the LR–SIC and LR-SICu tests in Section 4.1. Also, the performance of these three tests is investigated in conjunction with the BS detection procedure; Section 4.2. In Section 5 we apply the tests to a data set of three financial risk factors. Section 6 concludes.
2. Testing multivariate parameter constancy 2.1. Hypotheses Suppose that X=(X1 , . . . , Xn ) is a finite sequence of n consecutively observed independent and identically distributed (i.i.d.) random variables taken from an m-parameter density f (x|) where the parameter vector is an m-dimensional vector such that = (1 , 2 , . . . , m ) ∈ ⊆ Rm , with n > m. The problem is to test the hypothesis H0∗ : 1 = 2 = · · · = n = 0
(3)
versus the alternative H1∗ : ∃m < k ∗ < n − m with 1 = · · · = k ∗ = k ∗ +1 = · · · = n , (j ) (1) (m) denotes the initial value of the parameter vector with 0 (j = 1, . . . , m) the initial where 0 = 0 , . . . , 0 value of the jth parameter at observation point 0. It is important to distinguish between the case 0 is known and 0 is unknown. 2.2. Test statistic Let i be an indicator random variable, taking a value 1 if there is a change in j between the ith and the (i + 1)th observation points. The associated probabilities are defined by P (i = 1) = pi and P (i = 0) = 1 − pi , with n−1 i=1 pi = 1. Also, let j,i denote the amount of change in the values of the parameter vector j in the interval
1894
Jan G. De Gooijer / Computational Statistics & Data Analysis 51 (2006) 1892 – 1903
(i, i + 1) (i = 1, . . . , n; j = 1, . . . , m). Assume that 1,i , . . . , m,i are independent for every i, and j,1 , . . . , j,n−1 are i.i.d. Clearly, the variables j,i , i are considered nuisance parameters since their values are unknown. To derive a statistic for testing (3) we use a Bayesian method, originally due to Chernoff and Zacks (1964), that puts a priori distributions on the nuisance parameters and on the change-point k ∗ . Then, under H1∗ , eliminate them by integration from the joint density to obtain an unconditional likelihood function of X. Let j = j,1 , . . . , j,n−1 (j = 1, . . . , m), and = (1 , . . . , n−1 ). Then, expanding f (xi |) under H1∗ in a first-order Taylor expansion about 0 we obtain i−1 m j f (xi |0 ) j,u u + o(1) (i = 1, . . . , n) f (xi |0 , 1 , . . . , m , ) = f (xi |0 ) + jj u=1 j =1 ⎛ ⎞ i−1 m j f (xi |0 ) ⎝1 + ln f (xi |0 ) j,u u ⎠ . jj u=1
j =1
Under H0∗ f0 (x|0 ) =
n
(xi |0 ), while under H1∗ the joint density of x is ⎛ ⎞ i−1 n n m j ⎝1 + f (xi |0 ) ln f (xi |0 ) j,u u ⎠ f (x|, 1 , . . . , m , ) = jj u=1 i=1 i=2 j =1 ⎧ ⎫⎞ ⎛ n−1 m n ⎬ ⎨ j f0 (x|0 ) ⎝exp j,i i ln f (xu |0 ) ⎠ . ⎩ ⎭ jj i=1 f
i=1
j =1
u=i+1
We assume j ∼ N 0, ˜ 2j , for some known ˜ 2j > 0, and integrate with respect to the m(n − 1) j,i ’s (i = 1, . . . , n − 1; j = 1, . . . , m). Then, on noting that 2i = i , we obtain ⎤ n m 2 ⎡ 2 n−1 m ˜ j 1 j =1 j ⎣ f0 (x|0 ) exp 1+ i ln f (xu |0 ) + op 2 ⎦ . f1 (x|0 , ) 2m 2 jj i=1
j =1
u=i+1
Hence, the LR statistic is 1 f1 (x|0 , ) exp = n (x|) ≡ f0 (x|0 ) 2m
m
˜ 2j j =1 2
⎡ ⎣1 +
n−1
i
i=1
m j =1
n u=i+1
2 ⎤ ⎦.
j ln f (xu |0 ) jj
Eliminating , and letting ˜ 2j → 0 for each j = 1, . . . , m, we get the two-sided test statistic Up∗ (X) =
P ()n (x|) =
n−1
n
pi
i=1
vu (0 )
u=i+1
n
vu (0 ) ,
(4)
u=i+1
where vu (0 ) = j ln f (xu |0 )/j1 , . . . , j ln f (xu |0 )/jm is the gradient of the log-likelihood function. H0∗ is rejected iff Up∗ (X) > c (m), where is the given level of significance. 2.3. Large-sample distribution, 0 is known Since, under H0∗ , one knows that {Xi }1 i n and {Xn−i+1 }1 i n are equal in distribution it follows that L Up∗ (X) =
n−1 i=1
pn−i
i u=1
vu (0 )
i u=1
vu (0 ) .
Jan G. De Gooijer / Computational Statistics & Data Analysis 51 (2006) 1892 – 1903
1895
The limit process becomes more simple if Up∗ (·) is modified to Vp∗ (X) =
n−1
pn−i I
−1/2
(0 )
i
vu (0 )
I
−1/2
(0 )
u=1
i=1
i
vu (0 ) ,
(5)
u=1
where I (0 ) is the m × m Fisher information matrix, which is assumed to exist in the neighbourhood of 0 in the interior of . To prove convergence in distribution we introduce a sequence of random vector functions, on the space of contin(1)
(m)
uous real-valued vector functions C[0,1]m , as Yn (t) = Yn (t), . . . , Yn (t) , t ∈ [0, 1] sampling paths by the relation
∞
n=1
1 Yn (t) = √ Sn,[nt] (0 ) + (nt − [nt])vn,[nt]+1 (0 ) I−1/2 (0 ) , n
possessing continuous
(6)
(j ) (j ) with where each component Yn (t) is the continuous polygonal line through the points 1/n, n−1/2 Sn,i (·) 0i n (j ) Sn,i (·) the jth component of Sn,i (0 )=I−1/2 (0 ) iu=1 vu (0 ), Sn,0 (0 ) ≡ 0, and with [nt] the integer part of nt. Since (j )
(j )
L
Sn,i (·) are partial sums of i.i.d. random variables Donsker’s Theorem states that Yn (t) → B (j ) (t) in C[0,1] ∀1 j m, where B (j ) (t) is the univariate standard Brownian motion. Davidson (1994, Theorem 27.17) extended this result to vector-valued processes. If mild conditions on the stochastic process I−1/2 (0 ) vu (0 ) are imposed, the following L
functional central limit theorem (FCLT) can be proved: Yn (t) → B(t) = (B (1) (t), . . . , B (m) (t))in C[0,1]m with B (j ) (t) (1j m) independent. To obtain the limiting distribution of (5), assume {pi }n−1 i=1to be a non-negative weight sequence on the unit interval (2i+1)/2n [0, 1] for the unknown change-point k ∗ such that pn−i = (2i−1)/2n (t) dt (i = 1, . . . , n − 1). If (·) denotes the primitive of (·), (5) can be written as −1
n
n−1 2i + 1 2i − 1 n−1/2 Sn,i (0 ) − n−1/2 Sn,i (0 ) 2n 2n i=1 n−1 2i + 1 i i 2i − 1 = − Yn = hn (Yn ) , Yn 2n 2n n n
Vp∗ (X) =
i=1
with mapping hn : C[0,1]m → R. Thus, one has to show that {hn } converges to h:C[0,1]m → R in the sense of Billingsley’s (1968, Theorem 5.5) extended Continuous Mapping Theorem (CMT). Here h(·) is given by 1
h(f) = 0
= 0
!
m
=1 ! m 1
"2 f
()
f
()
(t)
(dt)
"2 (t)
1
(t)dt = tr
(t)f(t) f(t) dt ,
0
=1
where f(t) = f (1) (t), . . . , f (m) (t) ∈ C[0,1]m . Then, under the assumptions of the FCLT and the extended CMT, we see that the test statistic i i 1 n−1 D −1 ∗ −1 n Vp (X) = pn−i vu (0 ) I (0 ) vu (0 ) → tr (t)B(t)B (t) dt , i=1
D
u=1
where → denotes convergence in distribution.
u=1
0
1896
Jan G. De Gooijer / Computational Statistics & Data Analysis 51 (2006) 1892 – 1903
2.4. Large-sample distribution, 0 is unknown In many real situations 0 is unknown instead of being known. Then, on replacing 0 in (4) by a consistent estimator, ˆ the resulting test statistic is given by say , i i n−1 ˆ ˆ Vp (X) = pn−i vu vu , (7) u=1
i=1
u=1
where vu ˆ = j ln f (xu |) /j1 | =ˆ , . . . , j ln f (xu |) /jm | =ˆ . Finite-sample distribution theory for (7) is m 1 √ complicated. Asymptotic theory is tractable. Using a Taylor expansion it follows that n ˆ − 0 = n−1/2 I−1 (0 ) n ˆ + o v (1). Also expanding the log-likelihood function about , we obtain ln f x | = ln f (xu |0 ) + ( ) p 0 u u=1 u 0 m ˆ (i) j ln f (xu |0 ) /ji + op (1). Differentiating this expression with respect to i and applying i=1 i − 0 Khinchine’s law of large numbers, as n → ∞, we have [nt] [nt] √ 1 ˆ 1 vu = √ vu (0 ) − t nIn ˆ ˆ − 0 + op (1) √ n n u=1
u=1
[nt] n t 1 vu (0 ) − √ vu (0 ) + op (1), =√ n n u=1
u=1
where In ˆ = −n−1 jvu () /j|=ˆ jvu ()/j|=ˆ is the second-derivative estimate of the information matrix. This result holds uniformly in t ∈ [0,1]. ∞ i ˆ I−1/2 denote a sequence of partial sums with Sn,i ˆ = Similar as in Section 2.3, let Sn,u ˆ n u=1 vu u=0 # n $∞ can be used to define the seˆ , Sn,0 ˆ ≡ 0. Then the sequence n−1/2 Sn,u ˆ − (u/n)Sn,n ˆ u=1 n=1 ∞ as follows quence of random vector functions Yˆ n (t), t ∈ [0, 1] n=1
1 −1/2 ˆ Yˆ n (t) = √ Sn,[nt] ˆ − Sn,n ˆ + (nt − [nt])vn,[nt]+1 ˆ In . n L We already showed that Yn (t) → B(t) = B (1) (t), . . . , B (m) (t) in C[0,1]m . Therefore, the following FCLT holds: L (j ) (1) (m) Yˆ n (t) → B0 (t) = B0 (t), . . . , B0 (t) in C[0,1]m where the components B0 (·) (1 j m), are independent Brow L nian bridges. This in turn leads to hn Yˆ n → h (B0 ) by another application of the extended CMT. Then, given (7), we see that i i n−1 n n i i −1 −1 n Vp (X) = pi vu ˆ − vu ˆ In ˆ vu ˆ − vu ˆ n n u=1 u=1 u=1 u=1 i=1 1 D → tr (t)B0 (t)B0 (t)dt . (8) 0
The weight function (t) ≡ 1 corresponds to a uniform prior on the unknown change-point k ∗ . Then from (8) we have i i n n n−1 i i −1 −1 −1 n Qn (m) = (n − 1) vu ˆ − vu ˆ In ˆ vu ˆ − vu ˆ n n u=1 u=1 u=1 i=1 u=1 1 D B0 (t)B0 (t) dt , (9) → tr 0
Jan G. De Gooijer / Computational Statistics & Data Analysis 51 (2006) 1892 – 1903
1897
1 with Q0 (m) = 0. Kiefer (1959) provided percentage points for P tr 0 B0 (t)B0 (t) dt < c (m) for selected values of and m. 3. Testing changes in covariance matrices 3.1. Multivariate Cramér–von Mises type test statistic The test statistic Qn (m) can be specialized to testing various interesting cases. For instance, assuming X follows a multivariate normal distribution, Qn (m) can be used for testing changes in the mean vector, testing for changes in the covariance matrix, and testing for changes in the mean and covariance matrix jointly. Here we consider the case of testing for changes in covariance matrices. To allow for a comparison between our test and the test statistics given in section 2 be 3.2 we assume that, under H0 , X1 , . . . , Xn are i.i.d. Nm (0, ) where = diag 21 , . . . , 2m . Let ˆ 2j = n−1 ni=1 xj,i the ML estimator of 2j . Then, it is easy to see that 2 2 x1,u xm,u 1 1 2 2 vu ˆ 1 , . . . , ˆ m = 1 − 2 ,..., 1− 2 ˆ 1 ˆ m −2ˆ 21 −2ˆ 2m while I−1 n
ˆ 21 , . . . , ˆ 2m
2ˆ 41 2ˆ 4 = diag ,..., m n n
.
Substituting these expressions into (9) the derived test statistic is m n−1 Cj,i n2 i 2 Qn (m) = − , 2(n − 1) Cj,n n
(10)
i=1 j =1
2 (i = 1, . . . , n − 1; j = 1, . . . , m). Test statistic Q (m), as given by (10), will be the main where Cj,i = iu=1 xj,u n focus of attention in the rest of the paper. When, using of the change-point k ∗ . % to be considered % % (10), the next problem % is the estimation of the location % k % n−k % % Let SS k = % i=1 (xi − xk ) (xi − xk ) % and SS n−k = % i=1 (xi − xn−k ) (xi − xn−k ) % where xk = k −1 ki=1 xi and ∗ xn−k = (n − k)−1 n−k i=1 xi . Then an estimate of k is obtained by minimizing SS n,k = SS k + SS n−k , i.e. SS k ∗ = minm