On Second-Order Statistics of Log-Periodogram With ... - ece.​gmu

Report 0 Downloads 53 Views
IEEE SIGNAL PROCESSING LETTERS, VOL. 12, NO. 9, SEPTEMBER 2005

625

On Second-Order Statistics of Log-Periodogram With Correlated Components Yariv Ephraim and William J. J. Roberts

Abstract—We derive an explicit expression for the covariance of the log-periodogram power spectral density estimator for a zero mean Gaussian process. We do not make the assumption that the spectral components of the process are uncorrelated. Applications to spectral estimation and to cepstral modeling in automatic speech recognition are discussed. Index Terms—Log-periodogram, speech recognition.

I. INTRODUCTION

W

E derive an explicit expression for the covariance of the log-periodogram power spectral density estimator. The , which estimator is obtained from a time series is assumed to have been drawn from a zero mean Gaussian process with arbitrary autocorrelation function. Let (1)

denote the discrete Fourier transform (DFT) component of the . We also refer to time series at frequency rad, where as a spectral component of the time series. The periodogram . We are interested in of the time series is given by an explicit expression for the covariance cov

(2)

when and , , are not assumed uncorrelated. The natural logarithm is used throughout the letter. The covariance of any two spectral components of a stationary process with is given by autocorrelation function cov (3) For finite , the spectral components are generally correlated. Two noted exceptions are obtained when the process is either white or almost surely periodic with period , and the . DFT is sampled at frequencies that are multiples of The log-periodogram covariance expression we derive here refines a standard form of the covariance obtained under the are statistically assumption that the spectral components Manuscript received January 6, 2005; revised March 14, 2005. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gerald Matz. Y. Ephraim is with the Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030 USA (e-mail: [email protected]). W. J. J. Roberts is with Atlantic Coast Technologies, Inc., Silver Spring, MD 20904-2545 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/LSP.2005.853049

independent Gaussian random variables (see, e.g., [4, Eq. 11]). This assumption follows from asymptotic properties of spectral components of a stationary process with fast decaying autocorrelation function (see, e.g., Brillinger [2, Th. 4.4.1]). In some applications, however, this assumption may not hold either because of lack of stationarity or due to a relatively small . This situation may be encountered in automatic speech recognition applications, where the signal is not strictly stationary, and hence, the frame length is limited. In this letter, we assume that the process is Gaussian, and hence, its spectral components are jointly Gaussian. We relax the common assumption that the jointly Gaussian spectral components are uncorrelated. In [3, Eq. 2.6], the spectral components were assumed statistically independent, but their densities differed from normal. The covariance expression derived here generalizes another result from [4]. In [4, Eq. 13], an explicit expression was developed for the covariance between the log-periodogram of a clean signal and the log-periodogram of a corresponding noisy signal at the same frequency. The noise was assumed additive and statistically independent of the signal. Both the signal and noise were assumed zero-mean Gaussian. In that case, the clean and noisy spectral components at any given frequency are correlated, and the log-periodogram covariance can be obtained from the expression we develop for (2). A key function in our development is the hypergeometric function [6, Ch. 9]. In [4], analytic continuation of this function was used in deriving the covariance between the clean and noisy log-periodogram estimates. Here, we provide a new proof for the more general result, which does not require analytic , continuation of the hypergeometric function when . Explicit expressions for the second-order statistics of the log-periodogram of a given signal are important in the theory of spectral estimation [9], [10]. In addition, they provide some insight into the theory of cepstral coefficients that are extensively usedinstatistical modelingof speechsignals forautomaticspeech recognition applications (see, e.g.,[4], [8] and the references therein). Cepstral coefficients are obtained from the inverse Fourier transform of a log-power spectral density estimate of the signal. Merhav and Lee [8] have shown that cepstral coefficients of a zero-mean stationary Gaussian process, obtained from the windowed autocorrelation power spectral density estimator . [9, Sec. 6.2.3], are asymptotically uncorrelated when For finite , the covariance of cepstral components obtained from the log-periodogram power spectral density estimator may be estimated from the log-periodogram covariance expression developed here.

1070-9908/$20.00 © 2005 IEEE

626

IEEE SIGNAL PROCESSING LETTERS, VOL. 12, NO. 9, SEPTEMBER 2005

II. MAIN RESULTS Consider a zero-mean Gaussian process with spectral compoas defined in (1). We are primarily interested in the nents . The log-perilog-periodogram covariance (2) for odogram covariance associated with the frequencies is given in Section III. The log-periodogram covariance is exand pressed in terms of the correlation coefficient between defined as

and has the usual the real and imaginary components of structure for spectral components as detailed in [11, Eq. 11–20]. This implies, for example, that the real and imaginary compoare noncorrelated, and each has the same variance. nents of . The desired covariFor this notation, ance expression (6) is obtained from the second-order derivative of the moment generating function (8)

(4) denotes the complex conjugate of . where For , , it is well known that the expected value of the log-periodogram is given by (see, e.g., [3, Eq. 2.5], [4, Eq. 10])

and from (5). with respect to and at The moment generating function is evaluated as (9) The conditional density of 11–22]

given

is given by [11, Eq.

(5)

(10)

where is the Euler constant. Furthermore, . Our main result is the explicit expression for the covariance of the log-periodogram given by

where denotes the conditional variance of given . Using (10), we evaluate the inner expected value . in (9) in polar coordinates. Define Using [5, Eqs. 8.411.1, 6.631.1], we obtain

cov

(6)

When the spectral components and are assumed uncorfor and otherwise. In that related, case, (6) is reduced to a well-known form in which it is equal when and to zero otherwise (see, e.g., [3, Eq. to 2.6], [4, Eq. 11]). The covariance expression in (6) may be applicable to spectral components that were not necessarily derived from the same and denote, respectively, zeroprocess. For example, let mean Gaussian spectral components from a signal and an additive statistically independent noise process. The squared correand is given by lation coefficient between , which is recfrom . ognized as the Wiener gain factor for estimating For this case, we show that the log-periodogram covariance is with . This result was obtained from (6) by substituting originally derived in [4, Eq. (13)] and used for designing a linear estimator for the log-periodogram of the clean signal from the log-periodogram of the noisy signal. The derivation in [4] relied on analytic continuation of the hypergeometric function. The proof of the more general result (6) we provide here does not require such analytic continuation. III. DERIVATION OF THE LOG-PERIODOGRAM COVARIANCE and denote two Let normalized spectral components, which, under our assumptions, are zero-mean jointly Gaussian random variables. Let (7) denote the covariance matrix of the vector , where denotes vector transpose. , , and Consider first the case where are complex random variables. The 4 4 covariance matrix of

(11) where

denotes the Gamma function, and (12)

denotes the degenerate hypergeometric function with converand gence region of [5, Eq. 9.210.1]. Substituting (11) into (9), applying the expected value to each term of the infinite sum resulting from (12), and evaluating the individual expected values in polar coordinates using [5, Eqs. 3.381.4, 8.331], we obtain (13) where (14) is the hypergeometric function [6, Ch. 9] with convergence re. Note that the argument of the hypergeometric gion of function in (13) is in its convergence region, and application of the expected value term by term is justified. Applying the linear transformation [6, Eq. 9.5.3] (15) to (13), we obtain

The second-order derivative of (16) at at and using at for [4, Eq. A.23].

(16) is obtained

EPHRAIM AND ROBERTS: SECOND-ORDER STATISTICS OF LOG-PERIODOGRAM

For the log-periodogram covariance between a clean and a noisy spectral component at a given frequency , it is easy to check that the required structure of the covariance matrix of the real and imaginary components of the clean and noisy spectral components is satisfied. Thus, the log-periodogram covariance . In the original proof can be obtained from (6) using of this result in [4], the moment generating function was evaluand . ated as in (9) using As a result, the argument of the hypergeometric function was the negative of the signal-to-noise ratio at the given frequency, which can exceed one in absolute value. In such cases, the series representing the function does not converge, and analytic continuation of the hypergeometric function is necessary [6]. This difficulty can be circumvented if the moment generating and function is evaluated as above using in (9). , The covariance of the log-periodogram when and , may be obtained using a and when similar approach to that used above. In these two cases, however, analytic continuation of the hypergeometric function is neces. sary for valid covariance expressions over the full range of , was evaluated using [5, Eqs. For 3.562.2, 9.240, 8.331, 8.335.1, 9.212.1, 3.381.4] in that order. The resulting hypergeometric function converges for only. Applying the linear transformation [6, Eq. 9.5.1] (17) gives

(18) where convergence is for . For and , we have applied the same sequence of steps as , but the reabove and also used [5, 1.320.5]. Here . sulting hypergeometric function converges only for Applying the linear transformation (17) gives

(19) with the desired convergence region. The covariance of the logperiodogram corresponding to these two cases can be obtained of (18) and from the second-order derivatives at (19), respectively, and by using (5) and [4, Eqs. 10, A.10]. For , we have cov and for cov

(20) and

627

IV. CEPSTRAL COEFFICIENTS obtained from inverse Consider cepstral coefficients DFT of a windowed log-periodogram as follows: (22) , , and denote the where window function. The window may be used to exclude unde, , or sired components of the periodogram, such as at to achieve consistency of the cepstral estimates. , , it For a window that nulls out the components at is straightforward to verify, using (6), that the covariance of the cepstral coefficients is given by cov

(23) , is easily obtained from (23) using The variance of , . For uncorrelated spectral components, ap. For correlated spectral compoproaches zero when nents, consistency may be achieved by a proper choice of the in (22) similarly to [9, Sec. 6.2.4]. window V. APPLICATIONS In this section, we briefly mention two applications of our results. In the first application, we provide a theoretical estimate of the covariance of the logarithm of the squared magnitude of the spectral error in autoregressive parameter estimation. In the second application, we have embedded the covariance matrix (23) in a speech recognition system. We have used that matrix to predict the covariance of each state of the hidden Markov process (HMP) that models the acoustic signal from a given word. denote a zero-mean For the first application, let Gaussian autoregressive process with gain and coefficients . Let denote the true probability measure of . Suppose that and are the process. Let estimated in the maximum likelihood sense from samples of . In particular, the estimates are obtained from the sample covariance covariance estimate of the of the process . Let denote the estimate of . It was shown by Mann and Wald [7] that -weakly as n (24)

, we have (21)

Note that the transformations (15) and (17) are related. In fact, the former can be obtained from the latter [6]. The transformation (15), however, does not affect the convergence region of the hypergeometric function and, thus, does not provide analytic continuation of that function.

This central limit theorem states that the asymptotic distribution of the error in estimating the autoregressive coefficients is . zero-mean Gaussian with asymptotic covariance If we adopt these asymptotic results as valid for sufficiently large , then (6) can be used to assess the covariance of , where and denote, respectively, the -length DFTs of and augmented with and where . In particular, the variance of the

628

Fig. 1. Empirical estimate of 

IEEE SIGNAL PROCESSING LETTERS, VOL. 12, NO. 9, SEPTEMBER 2005

,  = (2=k )l for the digit “two.”

logarithm of the squared magnitude of the spectral error for any is constant and is given by . For the second application of automatic speech recognition, we have first studied the extent of the correlation among spectral components of speech signals. We have used the training section of male speakers from the TI-digit database. This database contains two utterances per digit from each member of its 55 speakers. For each of the ten English digits and the word “oh,” we have estimated from frames of length samples at 8 kHz sampling rate from all utterances of all speakers. for , as obtained from sample Fig. 1 depicts covariance estimates for the digit “two.” In this figure, we have , since all have a unity value. suppressed the diagonal terms Other digits exhibited varying correlation ranging from zero to approximately 0.8, with the heaviest spectral correlation seen for the digit “six.” for each digit was subsequently used The estimate of in (23) to predict the covariance matrix of cepstral vectors from that digit. We have used a rectangular window that suppressed . The predicted the log-periodogram components at covariance was then used in an HMP-based speech recognition system with Gaussian densities. The predicted covariance was attributed to all states of the HMP. The Baum algorithm was used to estimate the mean vector of each Gaussian density only, as well as the transition matrix of the Markov chain. We have maintained only the diagonal terms of the predicted covariance, since the off-diagonal terms of this matrix were negligibly small. The speech recognition setup we used here is similar to that described in [4]. In particular, we have recognized the ten English digits and the letter “oh” from the male testing section of the TI-digit database. For each digit, two utterances were available from each of 56 speakers. In all cases, we have used ten states, two mixture components per state, and frame length samples at 8 kHz sampling rate. We have compared our system, which uses the predicted diagonal cepstral covariance matrices, with the standard approach in which diagonal covariance matrices are estimated from the training data for all states of the

HMP using the Baum algorithm. We have also compared our system with the system in [4], where a fixed, data-independent, diagonal covariance matrix was attributed to all states of the HMP. In [4], we could use a fixed data-independent matrix since the spectral components were assumed statistically independent. We have practically achieved the same performance in all three systems. The benchmark recognition accuracy for the system using the Baum algorithm for estimating the covariance matrices was 99.19%. The system using the predicted covariance matrices developed here achieved 99.03%. The system in [4], which uses a fixed data-independent covariance matrix for all states, achieved 98.95%. The small differences in performance obtained in these three systems were verified to be statistically insignificant using the bootstrap sampling probability of improvement estimate proposed by Bisani and Ney [1]. These preliminary results are encouraging, since they show that the individually estimated covariance matrices for all states of an HMP for a particular digit could be substituted for by the predicted covariance. This is particularly important when only limited data are available for training of the HMP for each word. VI. COMMENTS We have generalized a standard result for the covariance of the log-periodogram of a zero-mean Gaussian process by relaxing the assumption that the spectral components at various frequencies are statistically independent. An interesting related problem is that of estimating the covariance of the log-power spectral density estimate of an autoregressive process given by . This, however, turns out to be a much harder has nonzero mean that asymptotically is given problem since by . An asymptotic result for this problem was provided by Merhav and Lee [8]. REFERENCES [1] M. Bisani and H. Ney, “Bootstrap estimates for confidence intervals in ASR performance evaluation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, pp. 409–412. [2] D. R. Brillinger, Time Series-Data Analysis and Theory. Philadelphia, PA: SIAM, 2001. [3] H. T. Davis and R. H. Jones, “Estimation of the innovation variance of a stationary time series,” J. Amer. Stat. Assoc., vol. 63, no. 321, pp. 141–149, Mar. 1968. [4] Y. Ephraim and M. Rahim, “On second order statistics and linear estimation of cepstral coefficients,” IEEE Trans. Speech Audio Process., vol. 7, no. 2, pp. 162–176, Mar. 1999. [5] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products. New York: Academic, 1979. [6] N. N. Lebedev, Special Function and Their Applications. New York: Dover, 1972. [7] H. B. Mann and A. Wald, “On the statistical treatment of linear stochastic difference equations,” Econometrica, vol. 11, no. 3/4, pp. 173–220, Jul.–Oct. 1943. [8] N. Merhav and C.-H. Lee, “On the asymptotic statistical behavior of empirical cepstral coefficients,” IEEE Trans. Signal Process., vol. 41, no. 5, pp. 1990–1993, May 1993. [9] M. B. Priestley, Spectral Analysis and Time Series. New York: Academic, 1992. [10] K. S. Riedel and A. Sidorenko, “Adaptive smoothing of the log-spectrum with multiple tapering,” IEEE Trans. Signal Process., vol. 44, no. 7, pp. 1794–1800, Jul. 1996. [11] A. D. Whalen, Detection of Signals in Noise. New York: Academic, 1971.