Statistical Methods of Spectrum Change Detection Barry G. Quinn Statistics Dept, EFS, Macquarie University Sydney, NSW, 2109, Australia Email:
[email protected] Abstract— We discuss in this paper the problem of determining whether two time series have been produced by the same mechanism. In particular, we examine models and techniques for determining whether two series could have come from stochastic processes with the same continuous and discrete spectral characteristics.
I. I NTRODUCTION The terms “spectrum” and “spectra” are used in many areas of science. They are used rather loosely, and authors often use the terms to mean very different things. One important question concerning “spectra” is common to most of these areas of application. Can two spectra calculated at different times or places be attributed to the same object? Do two voice recordings contain speech from the same person? Do two samples of soil come from the same place. Do two compounds contain the same elements? The last three questions are often answered using “spectra”. Coates and Diggle [1] have discussed the problem of testing whether two time series come from stationary stochastic processes with the same spectral shape. Their technique is “nonparametric” in that the processes are not modelled, and the periodograms of the processes are used to produce a test. In this paper we discuss the statistical modelling of the problem. We consider two problems. In the first, we assume that the time series come from stationary autoregressive processes, and test whether these processes have the same autoregressive parameters, and therefore the same spectral shape. In the second, the processes have additive (deterministic) sinusoids, and we test whether these sinusoids have the same frequencies, and that the stationary noise components have the same spectral shape. This may be relevant when the two series have been collected by the same apparatus at close points in time, and there is a suspicion that there are ‘spectral line’ components in one but not the other. The test is expected to be asymptotically optimal, as it is based on a (Gaussian) likelihood ratio test. The (asymptotic) distribution of the statistic under the null hypotheses are known, and so the value of the test statistic may be used to compare the ‘closeness’ of one series with the other. It should be stressed that the testing is not standard, since we do not assume that the number of sinusoids or autoregressive orders are known, but treat these as parameters to be estimated. II. M ODELS FOR MIXED SPECTRA The first problem to be examined is that of testing whether the spectral shapes of two processes are the same. It will be assumed that two time series {xt ; t = 0, 1, . . . , T1 − 1}
and {yt ; t = 0, 1, . . . , T2 − 1} are partial realisations of two stationary stochastic processes {Xt } and {Yt } . Let {γX (j)} and {γY (j)} denote the two autocovariance sequences. We wish to test whether or not γY (j) /γX (j) depends on j, given the data. Coates and Diggle [1] suggest for the case where T1 = T2 = T, the use of the test statistic ½ ¾ ½ ¾ IX (2πj/T ) IX (2πj/T ) log − min log W = max IY (2πj/T ) IY (2πj/T ) 1≤j≤b T −1 1≤j≤b T −1 2 c 2 c ¯P ¯2 ¯ T −1 ¯ where IZ (ω) = (2/T ) ¯ t=0 Zt e−iωt ¯ . This agrees with the common use of ratios in ad hoc visual comparison methods to spot differences between spectra. One problem with the technique in practice is that it would be expected to have low statistical power, as the set of alternative hypotheses is too broad. Another is that the technique relies on Gaussian white noise assumptions for the distribution of the statistic to be known when the null hypothesis is true. We thus cannot reliably define critical regions. In this paper, we instead adopt a parametric approach, without imposing distributional assumptions. We shall in fact assume that the stationary processes are autoregressive, with orders unknown. We assume without loss of generality that their means are 0, and always mean-correct the series before analysis. We thus assume under both null and alternative hypotheses that Xt +
pX X j=1
βX (j) Xt−j = εt , Yt +
pY X
βY (j) Yt−j = ut (1)
j=1
for some parameters βX (j) and βY (j) and orders pX and pY . We further assume that {εt } and {ut } are independent of each other, and that E {εt |εt−1 , εt−2 , . . .} = E {ut |ut−1 , ut−2 , . . .} = 0 © ª © ª E ε2t |εt−1 , εt−2 , . . . = σε2 , E u2t |ut−1 , ut−2 , . . . = σu2 These assumptions are much weaker than the usual independence assumptions, but stronger than mere zero-correlation. They are enough to ensure that all of the usual estimation procedures yield good estimators. The spectral density of {Xt } above is σε2 fX (ω) = ¯P ¯2 , ¯ pX ¯ ¯ j=0 βX (j) e−iωj ¯ so that the spectral shapes of {Xt } and {Yt } are the same if and only if βX (j) = βY (j) , ∀j.
The second problem is to discriminate between the processes {Xt } and {Yt } which are assumed to satisfy Xt =
fX X
{aj cos (ωj t) + bj sin (ωj t)} + Et
(2)
j=1
Yt =
fY X
{Aj cos (λj t) + Bj sin (λj t)} + Ut
j=1
where C (λ) is the p × p matrix with (i, j)th element ( T −1 ) TX 1 2 −1 X 1 λ Xt Xt−|i−j| + Yt Yt−|i−j| T1 + T2 t=0 t=0 and c (λ) is the p × 1 vector with jth element C0,j (λ). Put 2 p TX 1 −1 X Xt + σ bε2 = T1−1 βbj Xt−j ,
where Et +
t=0 pE X j=1
βE (j) Et−j = εt , Ut +
pU X
βU (j) Ut−j = ut , (3)
j=1
where {εt } and {ut } have the same properties as above. The processes {Xt } and {Yt } are no longer stationary, but their periodograms will have spikes evident near the true frequencies ωj and λj . We now wish to test the hypothesis that the underlying continuous spectra and the locations of the “spectral lines”, i.e. the frequencies, are the same. A related problem is to test that the frequencies are the same, regarding the continuous spectral components as “nuisance”. III. M ETHODOLOGY The likelihood ratio principle has been responsible for generating most well-known statistical tests. The likelihood functions are maximised under H0 , the null hypothesis, and HA , the alternative hypothesis, and the ratio of the maximum value under HA to that under H0 is used as test statistic. In fact, twice the natural logarithm of the ratio often has an asymptotic distribution which is χ2 under H0 . Moreover, the statistic has very good power properties, i.e. rejects H0 when it is false with probability as large as possible. However, we do not wish to impose any distributional properties. We can still compute the likelihood-ratio statistic, computed under some distributional assumptions such as Gaussianity, but do not rely on the Gaussianity to establish any of the technique’s properties. This is similar to computing Gaussian maximum likelihood estimators, but establishing the properties of these estimators under broader assumptions. A. Bivariate autoregressions with identical autoregressive parameters There are estimation procedures for the unknown parameters of multivariate autoregessive models, although the analog of the Levinson-Durbin algorithm, known as the Whittle recursion, is a little more complicated, since the recursion must be carried both forwards and backwards. We are concerned here, however, with a peculiar subset of these models, for which there are no known estimation techniques. These estimation procedures, therefore, have to be developed from scratch. We use Gaussian maximum likelihood estimation, with some shortcuts. Suppose {Xt } and {Yt } satisfy (1) , with pX = pY = p and βX (j) = βY (j) , ∀j. Firstly, estimate σε2 and σu2 by fitting separate autoregressions. Call these estimators σ bε2 2 2 2 b=σ and σ bu . Let λ bu /b σε . Let ³ ´−1 ³ ´ b b , βb = −C λ c λ
σ bu2 = T2−1
TX 2 −1
Yt +
t=0
j=1 p X
2
βbj Yt−j
j=1
and iterate the above procedure until “convergence” has occurred. B. Estimating common autoregressive order The above algorithm presupposes that the common order p is known. We thus need a method for estimating it. From Gaussian likelihood considerations, the minimiser of 2 2 φg (k) = T1 log σ bε;k + T2 log σ bu;k + kg (T1 + T2 ) , 2 2 over k = 0, 1, . . . , K, where σ bε;k and σ bu;k are the estimators assuming that the order is k, estimates the order (strongly) consistently if lim supT →∞ g (T ) / {2 log (log T )} > 1.We suggest using g (T ) = log T, as this is commonly used in other modern criteria.
C. The spectral muddle One of the first problems arises because of the “mixed model muddle”: in practice it is difficult to tell whether a time series is sinusoidal, or comes from a stationary process whose transfer function has a pole close to the unit circle. The problem is much more complex when an arbitrary number of sinusoids and autoregressions with arbitrary order are allowed as models. Any technique which fits these models automatically will need to be reliable. Kavalieris and Hannan [2] have suggested the following method for fitting the orders involved. The parameter estimation scheme is discussed below. D. Automatic order criterion Automatic criteria for determining system orders have been available for more than 30 years, and were initially devised for the estimation of autoregressive order. Quinn [3] suggested their use in determining the number of sinusoids, and Kavalieris and Hannan [2] suggested this generalisation. Suppose 2 {Xt } satisfies (3) and (2) with fX = f and pX = p. Let σ bk,m be defined by T −1 k X X 1 2 Xt + σ bk,m = min βj Xt−j βj ,j=1,...,k T t=0 j=1 e aj ,e bj ,ωj ;j=1,...,m
2 m n o X − e aj cos (ωj t) + ebj sin (ωj t) . j=1
(4)
For fixed m = 0, 1, . . . , F, let pb (m) be the minimiser of
Periodogram of 1st sample 6
4
2 T log σ bk,m + k log T,
2
2 for k = 0, 1, . . . , P and put σ bm = σ bp2b(m),m . Choose as estimator of f, fb, the minimiser of
IX (ω)
0
−2
−4
2 + {b p (m) + 5m} log T, T log σ bm
−6
−8
for m = 0, 1, . . . , F. We must assume ³ ´ that F > f and P > p. Then it can be shown that fb and pb fb converge almost surely to f and p respectively.
0
0.5
1
1.5
ω
2
2.5
3
3.5
The autoregressive estimation can be carried out using the Levinson-Durbin algorithm or by least squares. The results will be asymptotically equivalent. F. Special bivariate generalisation of Quinn and Fernandes
E. Fitting sinusoidal autoregressions
Computing (4) is not straightforward, because of the ωj . If these were known, the other parameters could easily be estimated by standard regression methods. That they are often not known, and have themselves to be estimated has generated an enormous literature (see e.g. Quinn and Hannan [4]). This is the technique we have used. Firstly, smooth the periodogram using, for example, a median smoother to “equalise the noise floor”. Use the method of Quinn and Fernandes [5] to estimate a sinusoidal frequency. Remove the sinusoid by regression from the original data, and repeat the smoothing, estimation and regression loop until m sinusoids have been removed. As an additional computational step, an iterative maximization of the sum of squares function starting at these estimators can be performed. This improves the accuracy of the estimators. Finally, an autoregession of order k is fitted to the residuals after removing these sinusoids from the original data by regression. The estimated variance of the innovation sequence is then 2 . The smoothing is important, since in practice a local σ bk,m maximum of the periodogram can be due to large values in the underlying noise spectral density rather than to a deterministic sinusoid. An example is given in the following figure. If the unsmoothed periodogram is maximised, a sinusoid with frequency near the maximiser of the continuous spectrum will be estimated, instead of near the frequency associated with the periodograms’s most noticeable spike. An alternative to smoothing, based on automatic “noise-floor equalisation” is discussed in Quinn [6].
Suppose two sets of random variables {X0 , X1 , . . . , XT1 −1 } and {Y0 , Y1 , . . . , YT2 −1 } are known to have sinusoidal components at the same frequency, with additive noise components which have the same variance. The following algorithm produces a frequency estimator which is equivalent to the Gaussian maximum likelihood estimator. As mentioned above, the sums of smoothed versions of the periodograms should be used in order to obtain the initial maximiser. 1) Let ω b0 be the maximiser of the periodogram IX (ω) + IY (ω) , © ¥ ¦ª over the frequencies 2πj/T ; j = 1, 2, . . . , T −1 , 2 where T = max {T1 , T2 } . Let b a = 2 cos ω b0 . 2) For t = 0, 1, . . . , T1 − 1, define ξt by ξ−1 = ξ−2 = 0 and ξt = b aξt−1 − ξt−2 + Xt , and for t = 0, 1, . . . , T2 −1, define ηt by η−1 = η−2 = 0 and ηt = b aηt−1 − ηt−2 + Yt . 3) Replace b a by PT1 −1 b a+2
PT2 −1 t=0 Xt ξt−1 + t=0 Yt ηt−1 , PT1 −1 2 PT2 −1 2 t=0 ξt−1 + t=0 ηt−1
and repeat steps 2 and 3, once only. 4) The estimator of ω is the current value of arccos (b a/2) . G. Bivariate processes with identical frequencies and autoregressive parameters We can use the ideas above to estimate the parameters iteratively:
500
Frequency
1) For fixed fX = f, estimate pX and a sinusoidal model 450 for {X0 , X1 , . . . , XT1 −1 }. Remove the sinusoids by b regression on the Xt and call the residuals Et . 400 2) Separately for fY = f, estimate pY , and a sinusoidal 350 model for {Y0 , Y1 , . . . , YT2 −1 }. Remove the sinusoids b by regression on the Yt and call 300 n the o residuals n oUt . bt and U bt . Denote 3) Fit a common AR model to E 250 by pb (f ) the order of the AR estimated. Let σ bε2Pand σ bu2 be p 200 the estimators of σε2 and σu2 , let βb (z) = 1 + j=1 βbj z j bt = βb (z) Xt and U bt = βb (z) Yt , where z is 150 and put E the lag-operator. 100 ej , B ej , and ωj , 4) Minimise with respect to the e aj , ebj , A 50 )2 ( f TX 1 −1 X e b bj e aj 0 Et − −4 −2 0 2 4 6 8 10 12 14 S= cos (ωj t) + sin (ωj t) σ b σ b σ b ε ε ε Likelihood ratio t=0 j=1 ( )2 f TX 2 −1 bt X ej ej U A B + − cos (ωj t) + sin (ωj t) . if λ is larger than the 100α percentile of the χ2 distribution ³ ´ σ b σ b σ b u u u t=0 j=1 with degrees of freedom f + p, estimated by fb + pb fb . The following describes one n such method for doing this. o V. S IMULATIONS bt /b Fit a single sinusoid to E σε ; t = 0, 1, . . . , T1 − 1 1000 simulations were carried out for each of the following o n bt /b σu ; t = 0, 1, . . . , T2 − 1 using the method scenarios. In each case, the maximum autoregressive order was and U in subsection F. Regress sinusoids at the estimated set to 10, and the maximum number of sinusoids to one more frequency from these sequences. Fit another sinu- than the actual number. That this was not set higher was for soid to the residuals from these regressions us- speed of computation. ing the method, and repeat until f sinusoids have 1) been fitted. Then nrecompute the residuals by o reT1 = T2 = 1024, σ1 = 0.2, σ2 = 0.5, f = 1, p = 1, bt /b gressing each of E σε ; t = 0, 1, . . . , T1 − 1 and n o ω1 = 2π × 125.4/1024, β1 = −0.7, bt /b U σu ; t = 0, 1, . . . , T2 − 1 on cosines and sines at a1 = 1, b1 = 0, A1 = 0, B1 = 1 the f estimated frequencies. Notes: 11 of the 1000 simulations yielded test statistics 5) Repeat steps 3 and 4 once only. which were negative. This can only have happened 6) Maximise the logarithm of the likelihood with the ωj , 2 2 because the algorithm for maximising the likelihood σε and σu replaced by their current estimators. This is under HA did not get close enough to the true maximum. straightforward as l is quadratic in the remaining para2 2 Further work will be done to strengthen the algorithm. meters. Re-maximise with respect to σε and σu , with all The frequency histogram of the likelihood ratio values other parameters fixed at their current estimators. Denote 2 2 2 2 is given in the following figure. by σ bε;f and σ bu;f the estimators of σε and σu . Let The mean and variance of the values were 2.0967 and 2 2 φ (f ) = T1 log σ bε;f + T2 log σ bu;f + (b p (f ) + 7f ) log (T1 + T2 ) . 4.9888. The theoretical mean and variance of the χ22 distribution are 2 and 4. The mean and variance of the Then the minimiser fb of φ (f ) over f = 0, 1, . . . , F is positive values were 2.1251 and 4.9625. chosen as the estimator of f. 2) IV. T HE LIKELIHOOD RATIO TEST T1 = T2 = 1024, σ1 = 0.2, σ2 = 0.5, f = 1, p = 2, The likelihood ratio test statistic is ( 2 ) ( 2 ) σ bε;fb σ bu;fb λ = T1 log + T log , 2 σ b2 b σ b2 b ε;fX
u;fY
2 2 where σ bε; and σ bu; are the estimators of σε2 and σu2 fbX fbY obtained by fitting sinusoidal autoregressions separately to the two series. For various reasons, the orders p and f fitted under H0 should each pX , pbY ) and ³ ´ be less than or equal to min (b b b min fX , fy . The test rejects H0 at level of significance α
ω1 = 2π × 125.4/1024, β1 = −1.6 cos (π/4) , β2 = 0.64, a1 = 1, b1 = 0, A1 = 0, B1 = 1 Notes: The autoregressive component has a spectral density which is peaky near ω = π/4, and thus might be confused with a spectral line. This is thus a good test of whether the algorithms can sort out the ‘mixed spectral muddle’. 5 of the 1000 simulations yielded test statistics were negative. The frequency histogram of the likelihood ratio values is given in the following figure.
500
450
450
400
400
350
350
Frequency
Frequency
500
300
250
300
250
200
200
150
150
100
100
50
50
0 −5
0
5
10
15
20
25
Likelihood ratio
0
0
100
200
300
400
500
600
Likelihood ratio
400
9.8056. The theoretical mean and variance of the χ24 distribution are 4 and 8. The mean and variance of the positive values were 4.2016 and 9.7956. The upper 5% of the asymptotic distribution is 9.4877. The number of simulations with likelihood-ratio statistics greater than this was 53, in close agreement with the expected 50. 4) We repeat the first simulations, but with the frequencies of the sinusoids in each series slightly different: the second frequency is
350
Frequency
300
250
200
150
2π × 125.9/1024.
100
50
0 −5
0
5
10
15
20
25
Likelihood ratio
The mean and variance of the values were 3.0171 and 6.9794. The theoretical mean and variance of the χ23 distribution are 3 and 6. The mean and variance of the positive values were 3.1281 and 6.9114. 3) T1 = T2 = 1024, σ1 = 0.2, σ2 = 0.5, f = 2, p = 2, ω1 = 2π × 125.4/1024, ω2 = 2π × 256/1024, β1 = −1.6 cos (π/4) , β2 = 0.64, a1 = 1, b1 = 0, A1 = 0, B1 = 1, a2 = 0, b2 = 0.5, A2 = 0, B2 = −0.7 Comment: Now there are two sinusoids. The noise components have spectral densities peaky with location of peak coinciding with one of the sinusoidal frequencies. 1 of the 1000 simulations yielded a test statistic which was negative. The frequency histogram of the likelihood ratio values is given in the following figure. The mean and variance of the values were 4.1971 and
None of the 1000 simulations yielded test statistics which were negative. The frequency histogram of the likelihood ratio values is given in the following figure. The mean and variance of the values were 458.3131 and 1,557.5, which are not indicative of a random variable which is χ22 The procedure is thus able to discriminate between series with the same continuous spectra but with slightly different sinusoidal frequencies. ACKNOWLEDGMENT Funding for this research was from the Defence Science and Technology Organisation, Maritime Operations Division, under contract requisition number 128113, project “Statistical Methods of Spectrum Change Detection”. R EFERENCES [1] D.S. Coates and P.J. Diggle, “Tests for comparing two estimated spectral densities,” J. Time Ser. Anal., vol. 7, pp. 7–20, 1986. [2] L. Kavalieris and E.J. Hannan, “Determining the number of terms in a trigonometric regression,” J. Time Ser. Anal., vol. 15, pp. 613–625, 1994. [3] B.G. Quinn, “Estimating the number of terms in a sinusoidal regression,” J. of Time Series Anal., vol. 10, pp. 71–75, 1989. [4] B.G. Quinn and E.J. Hannan, The Estimation and Tracking of Frequency, New York: Cambridge University Press, 2001. [5] B.G. Quinn and J.M. Fernandes, “A fast efficient technique for the estimation of frequency,” Biometrika, vol. 78, pp. 489–98, 1991. [6] B.G. Quinn, “Estimating a sinusoid in low SNR coloured noise,” Proceedings of the 2004 Intelligent Sensors, Sensor Networks & Information Processing Conference, IEEE Press, pp. 301–306, 2004.
Main Menu