Improved instantaneous frequency estimation ... - Semantic Scholar

Report 4 Downloads 180 Views
2964

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

Improved Instantaneous Frequency Estimation Using an Adaptive Short-Time Fourier Transform Henry K. Kwok and Douglas L. Jones Abstract—Instantaneous frequency estimation (IFE) arises in a variety of important applications, including FM demodulation. We present here a new time–frequency representation (TFR)-based approach to IFE based on an adaptive short-time Fourier transform (ASTFT). This TFR leads naturally to a type of short-term ML estimator of the IF. To further improve the performance, we apply a multistate hidden Markov model (HMM)-based post-estimation tracker. The end result is up to a 16-dB reduction in the threshold SNR over the frequency discriminator (FD) and an 8-dB improvement over phase-locked loop (PLL) for a Rayleigh fading channel.

I. INTRODUCTION The Fourier transform (FT) has been the standard tool for spectral analysis in the area of signal processing for many years. However, the FT is inadequate when the signal is nonstationary. Many signals in our world (for example, speech) exhibit spectra that change with time. Classical Fourier techniques only reveal the overall frequency content of these signals. Often, it is more important to know when those frequency components exist and how they change with time. The concept of instantaneous frequency (IF) has been created in response to this problem. The IF of a signal x(t) is defined as

fi (t) =

1

2

1

@ (t) @t

(1)

where the signal has the form x(t) = A(t) 1 ej(t) . One common example of IF signals is frequency-modulated (FM) radio. The FM radio signal is just the frequency-modulated audio signal, and the IF of the FM signal is simply the modulating audio signal. In order to retrieve the IF, we need an IF estimator (IFE) [1]. A. An Overview of Existing Instantaneous Frequency Estimation Methods Many methods exist for IF estimation. Most of them can be categorized into either filter-based or time–frequency-representation-based methods. Filter-based algorithms adjust their filtering structure based on the incoming signal sequences. The phase-locked loop (PLL) tracks the phase of the FM signals and falls within this category. In addition, both LMS and RLS adaptive-filter-based IFEs have also been proposed [2]. However, methods of this class often share the problem of being unable to track rapid variations in IF. In addition, the step size must be adjusted a priori; otherwise, they may not converge to the true answer. A time–frequency representation (TFR) is a signal representation in which time and frequency information are displayed jointly on a 2-D plane. There are many existing TFRs leading to different types of TFR-based IF estimators. Among them, the short-time Fourier transform (STFT) and the Wigner distribution (WD) [3] are two popular choices. Although desirable in many ways, the WD suffers the major handicap of large cross terms. The cross terms can be twice as large as the auto terms as well as being negative. The nonpositivity of the WD Manuscript received January 27, 1997; revised April 20, 2000. This work was supported in part by Joint Services Electronics Program under Contracts JSEP N00014-90-J-1270 and N00014-96-1-0129. The associate editor coordinating the review of this paper and approving it for publication was Prof. Moeness Amin. The authors are with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA. Publisher Item Identifier S 1053-587X(00)06680-0.

causes erroneous IF estimation as well as difficulty in interpreting the TFR. Some researchers proposed a cross WD (XWD) [4] to remedy this problem. This method is computationally expensive and requires iterations that make on-line applications (such as for FM demodulation) expensive. Another way to use the WD for IF estimation is to compute the first moment in frequency for each time; however, this method is equivalent to a frequency discriminator (FD), which is very sensitive to noise. The STFT guarantees positivity and is computationally efficient and very robust against noise. However, it suffers poor time–frequency resolution. Several methods fall into neither of the categories mentioned above. The first is based on applying the definition of instantaneous frequency fi (t) = (1=2)(@=@t)(t). The phase difference estimator (PDE) [5] evaluates the instantaneous phase and approximates the differentiation function by a filter. A modification of PDE proposed by Kay [6] uses a “weighted phase difference” estimator. It meets the CRLB for estimating the frequency of a stationary signal at high SNR. However, both methods fail at low SNR or in the presence of large signal variations. The other estimator is a polynomial phase modeling-based IFE proposed in [7]–[9]. The idea is to model the phase as a polynomial at each local time interval and then perform maximum likelihood (ML) parameter estimation of the coefficients of the polynomial. This method works very well on slowly varying signals. However, if the signal is rapidly varying or discontinuous in time, it performs poorly. Finally, there are hidden Markov model (HMM)-based frequency trackers proposed in [10], [11], and [22]. Streit and Barrett [10] and Quinn et al. [11] use an HMM-based algorithm to track the peak of the STFT. However, they are single-HMM based and perform poorly if the modulating signal is nonstationary (e.g., speech.) The poor TFR also degrades the performance of the tracker. White [22] proposed a Cartesian HMM (CHMM)-based scheme to track the IF and phase simultaneously. It is one of the best IF estimators known today. However, the CHMM introduces a large number of states, which makes it much more computationally expensive than the proposed scheme. B. Adaptive Short-Time Fourier Transform-Based Instantaneous Frequency Estimator We propose a new adaptive short-time Fourier transform (ASTFT)based IFE that inherits the advantages of the STFT without most of its drawbacks. This is done by using different windows at each time instance to achieve a good TFR. The adaptation rule is a generalized likelihood ratio test (GLRT) based on the STFT and is very robust. In addition, a post-ASTFT peak-tracking algorithm further improves the performance by following the continuous ridge in the time–frequency plane and removing the spurious deviations. The algorithm is constructed under a statistical detection and estimation framework and is an approximate maximum-likelihood sequence estimator (MLSE) of IF tracks. It can be implemented efficiently and can process real-time signals with moderately sophisticated hardware. The algorithm also lowers the detection threshold of FM signals over the STFT and the FD by 4–16 dB. C. Outline of the Paper In the following sections, we briefly review the STFT and its shortcomings and then proceed to the theory behind the ASTFT and its application to nonstationary signal analysis and, in particular, IF estimation (Section II). In Section III, we construct a dynamic programming (DP)-based peak tracker that further improves TFR-based IF estimation. In Section IV, we present the method of testing and the results concerning the performance of this algorithm, along with comparison with other current state-of-the-art methods. We also test the ASTFT-based FM demodulator in a fading environment in Section V.

1053–587X/00$10.00 © 2000 IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

Fig. 1. signal.

Downward chirp window performs well only in the first half of the

II. ADAPTIVE SHORT-TIME FOURIER TRANSFORM (ASTFT) A. Short-Time Fourier Transform (STFT) The STFT is a popular method for analyzing nonstationary signals. The STFT of the signal x(t) is defined as : X (t; f ) =

1

01

x( )h(

0 t) 1 e0j2f d

(2)

where h(t) should be a lowpass filter, and khk2 = 1. X (t; f ) can be interpreted as the correlation between x( ) and h( 0 t) 1 ej 2f . Note that h( 0 t) 1 ej 2f has its energy concentrated at time t and frequency f . Thus, jX (t; f )j2 can be viewed as the energy in x(t) at frequency f and time t. Often, one displays the energy at each time and frequency pair, i.e., P (t; f ) = jX (t; f )j2 . P (t; f ) is known as the spectrogram (SP) of x(t). It can be shown that there is an uncertainty relationship between the time and frequency resolution of a STFT. This means that a fine frequency resolution leads to a coarse time resolution and vice versa. Thus, there is a tradeoff between time and frequency resolution in choosing the appropriate window for analysis. A short window is more appropriate for signals of short durations, whereas a long window with finer frequency resolution is more appropriate for narrowband signals. The uncertainty principle also indicates a disadvantage of the STFT method for FM demodulation. Since all FM signals have changing IF, there is not a unique window that will do a good job for the entire signal. Consider the following example of a triangle pulse that is FM modulated. Two STFTs of the signal are computed. The first STFT uses a linear chirping window with decreasing frequency (downward chirp window). (See Fig. 1.) The second one uses a linear chirping window with increasing frequency (upward chirp window). (See Fig. 2.) Neither of the two windows mentioned does a satisfactory job for the entire signal. However, the STFT could work well if we use the downward chirping window for the first 100 samples and the upward chirping window for the next 100 samples. This idea leads to the concept of an adaptive short-time Fourier transform (ASTFT). B. Adaptive Short-Time Fourier Transform As we have seen, poor TF resolution, due primarily to mismatch between the signal and the window, is the major drawback of the STFT. We can overcome this by using different windows at different time instances. The window used at each time instance should be matched to

2965

Fig. 2. Upward chirp window performs well only in the second half of the signal.

the signal characteristics of that instance, thereby increasing the TFR, and presumably, the false detection probability will be reduced. 1) Definition: Let W be a set of windows fw1 ; w2 ; 1 1 1 ; wM g, where kwi k2 = 1 for i = 1; 2; 1 1 1 ; M . This is the set of windows from which the ASTFT algorithm can choose. Then, the ASTFT of a signal x(t) is given by Xastft (t; f ) =

1

01

x( )wi(t; f ) (

0 t) 1 e0j f d 2

(3)

where i(t; f ) is the index function for the chosen window at time t and frequency f , which should be chosen according to some adaptation rule (which will be discussed in the next section). The ASTFT is identical to the STFT, except that the analysis window is changing with time. Different windows are used at different times so that a good TFR is achieved uniformly over the whole signal. 2) Adaptation Criteria: In order for i(t; f ) to change with time and/or frequency, it needs an adaptation rule. We present two adaptation criteria—one based on maximum correlation and the other on a concentration measurement (minimum entropy) method [12], [13]. A Wigner distribution-based method was later proposed by Jones and Boashash [23]; however, this method depends on instantaneous moment estimates from the WD. As discussed earlier, WD-based moment estimates are noise sensitive; therefore, we do not expect this method to work well in the low SNR regions of interest here and, thus, do not consider this method further. Maximum Correlation Criterion: The maximum correlation rule states that the optimal window is the one that maximizes the projection of the signal onto the window. Mathematically, it is iMC (t; f ) = arg max

mM

1

x( )wm (

0 t) 1 e0j f d 2

: (4)

Combined with the maximum-likelihood (ML) rule for estimating the instantaneous frequency in white noise, an ASTFT IFE using the maximum correlation criterion becomes a GLRT. In this framework, each hypothesis (frequency bin) is conditioned by an unknown parameter vector  . This vector can contain signal parameters such as length, linear, and quadratic chirp rate. Conditioned on each hypothesis and the parameter vector, we have H0 : X .. . HK 01 : X

 p (xjH ) 0

 p (xjHK0 ) 1

2966

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

where K is the number of hypotheses (or frequency bins). The GLRT solution to this problem is to perform a ML estimation of the parameter vector  and then compute the likelihood ratio given Hi and  = ^. The output is the hypothesis with the largest likelihood ratio. Thus, the GLRT is a two-step ML detector. It first does an ML estimation of the unknown parameters, and then, it does an ML detection of the hypothesis using the ML estimates. The advantage of this criterion is that it has all the properties of ML estimation such as robustness. Concentration Measurement (Minimum Entropy) Criterion: Jones and Parks [14] defined a quantity called the concentration measurement of a TFD as Cw (t; f ) :

D(

0 t; ! 0 f ) jXw ( 0 t; ! 0 f )j 4

4

d d!

=

2

D(

0 t; ! 0 f ) jXw ( 0 t; ! 0 f )j 2

2

(5)

d d!

where D(t; f ) should be a lowpass weighting function centered at (0, 0), and Xw (t; f ) is the STFT of x(t) using window w . The concentration measurement is an approximate measure of the TFR. The justification can be found in [14] and [15]. Therefore, we want to maximize the concentration using a window among the allowed window set W . The concentration measurement rule is simply1 iCM (t; f ) = arg max Cw (t; f ): 1mM

(6)

3) Window Set: The window set W depends highly on the nature of the signals to be analyzed. No one set will perform well in all applications. Some possible sets are 1) windows with different chirp rates;2 2) windows with different lengths; 3) windows matched to different classes of signals. These are not the only possible choices, however. As long as the signals to be analyzed can be characterized, one can find a set of windows most suitable for them. If the signal characteristics are unknown and the signal-to-noise ratio (SNR) is high, one can use a large set of windows for an initial analysis and then prune down the window set using the statistics of window usage. In the low SNR case, a large set of windows is undesirable. Since the noise floor can be as high as the signal amplitude, if there are too many windows that do not match the signal characteristic closely, the windows may end up matching with the noise characteristics, which can lead to false detections or failures to reveal important signal frequency content. Example 1: We use a simplified version of our ASTFT to analyze the FM-modulated triangle pulse. Let W = fw1 ; w2 g where w1 is the window we used in Fig. 1 and w2 is the one in Fig. 2. Use i3(t) = 0j 2f d j as the adaptation arg max1mM; f j x( )wm ( 0 t) 1 e rule. This yields the ASTFT of the signal shown in Fig. 3. Note that the TFR is much improved over either Figs. 1 or 2.

1Another adaptation criterion closely related to the concentration measurement is the Renyi entropy [16]. It is defined as

H=

0

1 2

j

j

log X (t; f )

3

dt df:

2This is the set we use for IF estimation due to its excellent performance and relatively modest computational cost.

Fig. 3. TFR of the signal using the ASTFT has finer resolution than either of the two STFTs considered above.

C. ASTFT-Based IF Estimator We are given a sampled (at a sufficiently high rate) version x(n) of a signal having the form t x(t) = exp j 1 fi ( ) d + N (t) (7)

01

where N (t) is complex white Gaussian noise with zero mean and variance  2 , and fi (t) is the true instantaneous frequency (the modulating signal in FM.) The objective is to estimate the sampled version of fi (n), i.e., find f^i (n) such that f^i (n)  fi (n). A TFR-based approach solves this problem by picking the peak of the TFR of the signal. Mathematically, the estimate is given by

fj

jg

f^i (t) = arg max XTFR (t; f ) f

(8)

because at each instance, the frequency with the highest energy density is the most probable instantaneous frequency. Several practical issues must be addressed in implementation of TFR-based IF estimation. First, we have to quantize time and frequency. The input signal becomes an input sequence x(n), and the output values go from continuous to K distinct values corresponding to the digital frequency from 0 to 2 . These are the frequency bins in which the IF can lie. We refer to them as the hypotheses (or states) from now on. The ASTFT transformer analyzes the received sequence x(n) and computes its ASTFT given by

1

+

Xastft (n; k) =

l=01

x(l)wi(n; k) (l

0 n) 1 e0j  k=K 2 (

)l

(9)

for k = 0; 1; 1 1 1 ; K 0 1 and where i(n; k) is the index function determining which window to use at each time n and frequency bin k . This is simply the discrete-time, discrete-frequency version of the ASTFT algorithm given in Section II. Therefore, the good TFR property of the adaptive method is retained here. The adaptation rule determines the values of the index function i(n; k). Either the maximum correlation rule or the concentration measurement rule can be used. We apply one additional modification to the adaptation criteria given in Section II. In Section II, we are allowed the use of any window in W at any frequency. At some frequency bins, the entire set may be excessive because there are frequency bins that have more limited signal characteristics. For example, at the maximum (or minimum) frequency, the probability of a chirp window is practically zero since there is no

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

signal at higher frequency for the chirp window to match. Therefore, we reduce the number of permissible windows at each frequency if possible. The reason for this is to reduce the amount of candidate windows in the set. As mentioned above, a large window set could be detrimental to IFE performance. Thus, it is desirable to reduce the number of windows in the set to a minimum. This is done by observing that at some frequency slots, the characteristic of the signal is more predictable than others. Therefore, let 2k  W for k = 0; 1; 1 1 1 ; M 0 1. The maximum correlation criterion becomes

+1

iMC (n; k) = arg pmax 22

l=01

x(l)wp (l 0 n) 1 e0j 2(k=K)l

(10)

and the concentration measurement is

Cw (n; k)

+1

=:

+1

l=01 m=01

+1

+1

n=01 k=01

D(l 0 n; m 0 k)4 1 jXw (l; m)j4

D(l 0 n; m 0 k)2 1 jXw (l; m)j2

iCM (n; k) = arg wmax 22 Cw (n; k):

2 (11) (12)

Note that in practice, D(n; k) has finite support. In our simulation, D(n; k) is independent of the frequency index k and is strictly a function of the time index n. This simplication offers most of the benefit of concentration measurement with only a slight increase in computational cost over the maximum correlation method. Experiments have shown that the concentration measurement rule performs better for short windows at a slightly higher computational cost. 1) Window Set Selection: To find a suitable window set for IF estimation, assume that the received signal has the form given by (7). We further assume that fi (t) is continuous. For a small time interval 4T around time T , we can expand fi (t) in the first two terms of its Taylor series

fi (t)  fi (T ) + f_i (T ) 1 4T

(13)

when T 0 4T  t  T + 4T . The first term is the instantaneous frequency at time T , and the second term represents the contribution due to chirping. Therefore, we choose a set of windows that fits the linear approximation of the instantaneous frequency—a set of linear chirp windows. Since frequency is the derivative of phase, linear chirps are quadratic in time, and these windows take the form

wi (n) = w(n) 1 ej1c 1n

(14)

where ci is equal to half of the chirp rate, and w(n) is a real-valued window, such as a Hamming or a Gaussian window. III. PEAK TRACKING TFR-based IF estimation involves finding the TFR peak at each time instance. As the input SNR decreases, false peaks caused by noise may rise above the true peak. Peak tracking is used to reject these false peaks generated by noise. This is done by exploiting the continuity of most FM signals, e.g., modulated speech. Highly discontinuous tracks should be rejected even when they have high amplitudes. We propose a HMM-based peak-tracking scheme similar to that of [10] and [11]. We further improve this approach by introducing a multistate HMM scheme to handle the nonstationary of FM signals as well as a new metric to make the IFE approximately a maximum likelihood sequence estimator (MLSE).

2967

A. Hidden Markov Model (HMM) An optimal function p(fn01 ; fn ) should exist that works better than any other function to track the TFR peaks of the signal. Since the frequency is quantized to K slots, the problem of finding the optimal cost function is equivalent to finding an optimal functional matrix P , characterizing the transition characteristics from one time instance to another. This leads to the HMM approach found in [17] and [18]. The HMM assumes that there is a system that has a sequence of state transitions. However, this state sequence is unobservable. The state sequence generates a set of outputs according to a preset rule, and this rule is different for each state. (Otherwise, there is no way to make a decision from the observation.) The state sequence transits according to a transition probability matrix. The states S , the transition probability matrix P , and the observation-generating rule O , together form the HMM (S ; P ; O ).3 In our FM demodulation problem, the states are the set of frequency slots, and the observation is the STFT of the input signal. In order to formulate this problem in the framework of an HMM, we need to find the transition probability matrix and the observation-generating rules. In order to find the best path, we use the Viterbi algorithm, which is the DP solution [18] to the problem of finding fan g such that Pr[k1 = a1 ; 1 1 1 ; kN = aN jo(1); 1 1 1 ; o(N )] is maximized. The Viterbi algorithm is summarized as follows. 1) Initialization

0(0; k) = log

1 M

and

(0; k) = k

for k

= 0; 1; 2; 1 1 1 ; K 0 1:

(15)

2) Recursion For 1 

n  N 0 1, 0  k  K 0 1 f0(n 0 1; i) + log pik g 0(n; k) = 0max iK 01 + log po(n)jk (ojk) (n; k) = arg 0max f0(n 0 1; i) + log pik g: iK 01

(16) (17)

3) Termination

y(n) = arg 0max f0(N 0 1; i)g: iK 01 4) Backtracking For 0  n

(18)

 N 02 y(n) = (n + 1; y(n + 1)):

(19)

In order to find the observation-generating rules, we have to analyze the STFT. We consider the STFT using unit-energy Hamming L 2 windows, i.e., kwk2 = n=0L jw(n)j = 1. Thus, for 8 n; k , we n + L have X (n; k) = i=n0L x(i) 1 w(i 0 n) 1 e0j 12k(i0n+L)=N . Since the x(n) are complex jointly Gaussian r.v.s and the X (n; k) are linear combinations of the fx(n)g, the X (n; k) are, therefore, also complex jointly Gaussian random variables.

E [X (n; k)] = and Var[X (n;

k)] =

E [x(i)] 1 w(i 0 n) 1 e0j12kn=N = 0 Var[x(i)] 1 jw(i 0 n) 1 e0j 12kn=N j2

= Var[x(i)] 1 = Var[x(i)]: 3There

(20)

jw(i 0 n)j2 1 je0j12kn=N j2

(21)

is also a fourth component to a HMM—the initial state probability ; 111; 5 g. Since it is not critical in the IFE performance, we assume that it is uniform among all K states.

5 = f5

2968

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

Moreover, we make an approximation of fX (n; k)gK k=1 as a set of independent random variables (since the exact form of the distribution is very difficult to derive). Given the observation at time n and state s, as o(n) = (jX (n; 0)j; 1 1 1 ; jX (n; K 0 1)j), its distribution is

po(n)js (ojHk ) = po (n)js (oo jHk ) 1 po (n)js (o1 jHk ) 1 1 1 po (n)js (ok01 jHk ) K 01 = po (n)jH (oi jHk )

=

i=0 K 01 i=0

(22) (23)

oi 1 e0((M (k)+o )=( =2)) 2 =2

for 0((N

=

0 1)=2)  n  ((N 0 1)=2) and N 01)=2)

((

n=0((N 01)=2)

0:54 + 0:46 cos 2

n0 N 01

2

N 01

2

: (26)

0 1 (M odd) chirp windows are formed as wm; N (n) = w ; N (n) 1 ej mn (27) for 0((N 0 1)=2)  n  ((N 0 1)=2); 0((M 0 1)=2)  m  ((M 0 1)=2), where f_i (t) is the chirp rate, and f_i (t) = 2 . In our The other M

0

(24)

window set, we find numerically such that the adjacent windows have 98% correlation, i.e., jwTm; N 1 wPm61; N j = 0:98. M is selected such that it is the maximum number of windows without aliasing occurring. This condition can be expressed as

Note that the case of the ASTFT using the concentration rule results in the same observation-generating rules. The conditional probability of these observations is again given by the equation derived above. However, the mean amplitudes Mi (s) will differ due to the use of a different TFR.4

2 N 20 1 M 20 1   (28) and it turns out that M  N numerically. From now on, we denote WN = fwm; N giM 0M 0= = . The lengths N chosen are: 9, 11, 15, 19,

1I

0

Mi (k) 1 oi : 2 =2

B. Multistate HMM The transition probability matrix P assumes stationarity at the modulating signal. However, in many situations (e.g., speech), the modulating signal is also nonstationary. Under these circumstances, the HMM approach may perform poorly due to the large variance of the statistics. To remedy this problem, we propose a multistate HMM. Assume that the modulating signal transits between several states. For example, speech may be either voiced or unvoiced, with very different statistical signal characteristics. In each state, the modulating signal assumes some statistical structure. The multistate HMM approach defines a HMM for each state. A state detector is used to determine the current state of the modulating signal. The appropriate HMM is used for each time interval. For example, speech is roughly stationary over a 20–25 ms frame. Each frame can be characterized as either a voiced or silent portion. The voiced portion of speech exhibits high amplitude periodic components, whereas the silent portion consists mostly of low amplitude, noise-like signals. Each state can be well characterized by an HMM. A binary decision device can be used to distinguish between the voiced and silent portion. IV. RESULTS We present the results of testing the algorithm on a generic synthetic signal and an FM-modulated speech sample in this section. We pick a set of normalized Hamming windows as the window set. We use six different adaptation rules. We use a single-state HMM for the generic signals and a dual-state HMM for the speech sample. A. Test Setup 1) Window Set: We select a length-N (N odd) unit-energy Hamming window as the base window and chirp it at different rates to generate W . Thus

n0 N 01 1 w ; N (n) = 0:54 + 0:46 cos 2 N 0 21  0

(25)

4The cases in which different windows are used at different frequencies result

in the observations being the maximum of a set of correlated Rician random variables whose distribution is very difficult to compute.

( 1) 2 =( 1) 2

25, and 31 samples. 2) Adaptation Criteria: We consider several different adaptation criteria. Some differ in their window sets. Some are different because they use different adaptation rules (i.e., maximum correlation versus concentration measurement). We summarize their characteristics and their applicable situations here. 1) Blind Demodulation: No information about any of the characteristics of the signal other than it is a FM signal with a continuous modulating IF is provided. In this case, 2k = W for k = 0; 1; 1 1 1 ; K 0 1. The maximum correlation rule is used. 2) Bandlimited Maximum Correlation Rule. Same as 1) except it is also known that there is no signal energy in some of the K frequency bins. In this case

2k = W 2k = f;g

if the signal has energy in frequency k; if frequency k lies in the stop band:

3) Frequency-Dependent Maximum Correlation Rule: The most frequently used windows at each frequency k are known a priori. We can simply set 2k = flikely used windows at frequency kg  W . 4) Concentration Measurement Rule: Same as 1), but we use the concentration measurement criterion in this case. 5) Spectrogram with DP Peak Tracker:5 Use only the peak tracker with an input that is simply the STFT of the signal instead of the ASTFT. 2k = fw0 g for k = 0; 1; 1 1 1 ; K 0 1. 6) Concentration Measurement Rule with DP Peak Tracker: Use the concentration measurement ASTFT output for the DP peak tracker. 2k = W for k = 0; 1; 1 1 1 ; K 0 1. 3) HMM Statistics: The transition probability matrix P is generated by determining statistics from the true signal or the training data. For instance, we compute the STFT of the generic signals with no noise injection and find the true state sequence by finding the frequence slot with the maximum amplitude at each time instance. Then, we determine the probability of the state transitions. Let P = [pj ], where pij is the probability of going from frequency i at time k to frequency j at time k + 1, 8k , and let y(n) = arg maxk fjX (n; k)jg. Define 01  (y(n)0 i)1  (y(n = 1)0 j ). Thus, ij ^ = [ ij ], where ij Nn=1 5It turns out that the observation rules for the HMM model are very difficult to derive using the output of ASTFT with the maximum correlation rule. Thus, we present two special cases in which we can approximate the observation generating rule with relatively simple functions.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

2969

Fig. 4. True IF law of test signal #1.

Fig. 5. True IF law of the FM-modulated speech (test signal #2).

is the number of times that y (n) goes from state i to j . Then, P can be estimated by p^ij = ij = K j =1 ij . The mean amplitude set fMk (s)g is a function of the current state s. We estimate these mean amplitudes using an approach similar to our estimation of the transition probability. 4) State Detector: For the speech sample, we use a dual-state HMM, which requires a state detector to distinguish the current state at each instance. We will formulate this as a binary hypothesis testing problem. First, the speech sample is segmented into 25-ms frames. A binary decision test is done on each frame. The two hypotheses are

H0 : y(n) = exp(j) + N (n) H1 : y(n) = exp(j + j(n)) + N (n) where (n) is any continuous function that is exp(j) term represents the random phase, and

(29) (30)

not (n) = 0, the N (n) is a baseband

complex WGN. The decision rule is given by

 (m) =

1

M

be low enough that it detects the small amplitude (unvoiced or silent) portion under noisy situations but high enough that the distortion of the demodulated signal with no injected noise is small. It is found by experiment to be 0:85. B. Testing Signals Two signals are used for evaluating the performance of the algorithm. Signal #1 is given by

x1 (n) = exp[j 1 25 cos(0:03n)]

for n = 0; 1;

111

; 9999:

(32)

The second signal is a 5-s speech sample recorded at 8 kHz modulated to a 24-kHz FM signal. The plot of the true IF for the two signals are shown in Figs. 4 and 5. A complex additive white Gaussian noise is added to the signal. The noise variance is scaled to yield the desired input SNR. C. Performance Testing

M (m+1)]01

[

i=M 1m

H > y(n) < : H

(31)

This is simply the noncoherent detection rule for a constant signal. (We assume here that the carrier has been accurately recovered and that the FM signal has been demodulated to baseband.) The threshold should

We obtain a family of input SNR versus output SNR curves by using six different window lengths (9, 11, 15, 19, 25, and 31 samples.) The longer windows result in more distortion at high SNR but yield higher output SNR for low input SNR. The shorter windows provide less distortion but perform poorly at low SNR. (See Fig. 6.) Therefore, the maximum of the family at each SNR is used to characterize the performance of the IFE.

2970

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 10, OCTOBER 2000

Fig. 6. Family of performance curves using windows of different lengths and their maximum at each SNR.

The primary criterion for ranking the relative performance is the detection threshold. It is roughly defined as the “knee” of the input versus output SNR curve.

Fig. 7. Input versus output SNR using test signal #1 for all six test cases along with FD and STFT.

D. Test Results The simulation results are shown in Figs. 7 and 8. The performance of the STFT is better than the FD by about 15–16 dB. The basic ASTFT configuration (test case #1) improves that of the STFT by another 2 dB. The bandlimited constraint (test case #2) and the frequency-dependent window set (test case #3) each improve the performance by approximately another 0.5 dB. The performance of the concentration measurement rule is virtually identical to that of the maximum correlation rule. However, more in-depth analysis of the data reveals that for the short windows (length 9 and 11), the concentration measurement rule outperforms the maximum correlation rule by 1–2 dB. The peak tracking (test case #5) improves the STFT by approximately 4 dB and improves the ASTFT by another 3 dB. Test case #6 has the best performance among all test cases. The detection threshold is roughly 0 dB. For the second test signal, the lower output SNR is due to lower average output power. We see, once again, that the concentration measurement yields virtually identical results to the maximum correlation rule, except for the short window cases. The two tracking schemes perform well at lower SNR (