AN INTERFERENCE-FREE REPRESENTATION OF INSTANTANEOUS FREQUENCY OF PERIODIC SIGNALS AND ITS APPLICATION TO F0 EXTRACTION H. Kawahara, T. Irino∗
M. Morise
Faculty of Systems Engineering Wakayama University Wakayama, Wakayama, 640-8510 Japan
College of Information Science and Engineering Ritsumeikan University Kusatsu, Shiga, 525-8577 Japan
ABSTRACT
2. TEMPORALLY STABLE POWER SPECTRAL REPRESENTATION
An interference-free representation of the instantaneous frequency of constituent harmonic components of periodic signals is introduced. The power weighted average instantaneous frequency of a band-pass filter yields this property when the effective passband of the filter covers up to two harmonic components and the two windows used in averaging are separated by a half pitch period. The proposed representation eliminates the abrupt changes found in usual instantaneous frequency representations and is applicable to any periodic signals consisting of multiple harmonic components. An F0 extractor of voiced sounds based on this representation is introduced as an example of prospective applications.
Assume that spectral representation W (ω) of symmetric time windowing function w(t) effectively covers up to two harmonic components of periodic signals. (The symmetry of window shape is not necessary because it is assumed here for simplicity and practical importance.) It is general enough to assume a periodic signal x(t) with period T0 having two harmonic components: x(t) = ejkω0 t + αej(k+1)ω0 t+jβ ,
(1)
Index Terms— instantaneous frequency, power weighted average, F0 extraction, speech analysis, phase VOCODER
where ω0 = 2π/T0 represents the fundamental angular frequency and α and β are arbitrary real numbers. (Arbitrary integer k is set to 0 for simplicity.) Power spectrum P (ω, t) of x(t) yields the following equation:
1. INTRODUCTION
P (ω, t) = |W (ω)|2 + α2 |W (ω − ω0 )|2 + 2αW (ω)W (ω − ω0 ) cos(ω0 t + β),
Instantaneous frequency [1] is a relevant extension of frequency for time-varying signals and potentially has wide applications [2]. Instantaneous frequencies of band-pass filtered signals form the foundation of phase VOCODER [3], and the staircase-like structure found in voiced speech analysis [4] led to various pitch detection algorithms (PDAs) [5]. However, the abrupt changes usually found in instantaneous frequencies of multiple component signals cause trouble in practical applications and hinder their usefulness. The power weighted average of instantaneous frequency is used to alleviate this problem, but its potential importance has not been fully explored. Our proposed method is based on a special attribute of this power weighted averaging when it is applied adaptively to F0. The application of Flanagan’s equation [3] in this averaging reveals that an F0-adaptive procedure enables nearly perfect elimination of these abrupt changes and yields a position independent representation of instantaneous frequency. A similar idea to this F0-adaptive procedure was introduced in our previous articles on an interference-free power spectral representation called TANDEM [6, 7], which also plays a crucial role in this proposal. The next section briefly reviews the TANDEM concept and introduces out proposed method based on it. ∗ Partly supported by Grants-in-Aid for Scientific Research 22650042 and 20346331 by JSPS.
978-1-4577-0539-7/11/$26.00 ©2011 IEEE
5420
(2)
where t represents the position of the windowing function. The third term represents a temporally varying component. Averaging two power spectra calculated at two positions separated by T0 /2 cancels this third term. A temporally stable power spectral representation PT (ω, t), the TANDEM spectrum, is defined by the following equation: „ „ « „ «« 1 T0 T0 P ω, t − + P ω, t + . (3) PT (ω, t) = 2 4 4 Note that this is a special case of the Welch method [8] and yields better power spectral estimates than using a single time window even when no prior information on T0 is available. 3. F0-ADAPTIVE POWER WEIGHTED AVERAGE OF INSTANTANEOUS FREQUENCIES It is physically meaningful to average the instantaneous frequencies weighted by power spectra [9]. By power weighted averaging of instantaneous frequencies calculated at half fundamental period apart, the resulted instantaneous frequency does not have temporal variation due to interference between the neighboring harmonic components. This is our new finding introduced in this section. Let ωi (ω, t) represent instantaneous frequency calculated using a time windowing function positioned at t.
ICASSP 2011
Blackman: 3T0
3.1. Position independence of denominator of averaged instantaneous frequency
ωAi (ω, t1 , t2 ) =
P (ω, t1 )ωi (ω, t1 ) + P (ω, t2 )ωi (ω, t2 ) . P (ω, t1 ) + P (ω, t2 )
(4)
By substituting t − T40 and t + T40 with t1 and t2 , the denominator of Eq. 4 yields PT (ω, t) defined in Eq. 3, which does not have temporally varying components.
900 800 Instantaneous frequency (Hz)
Averaged instantaneous frequency ωAi (ω, t1 , t2 ) calculated using two windows positioned at t1 and t2 is defined by the following equation:
1000
700 600 500 400 300
3.2. Simplification of numerator
200
The temporally varying components in the numerator are also cancelled by substituting Flanagan’s equation into Eq. 4. The denominator of the following form of instantaneous frequency representation (slightly modified version of Flanagan’s equation) is canceled by calculating power weighted instantaneous frequency P (ω, t)ωi (ω, t): i h i h − [S(ω, t)] dS(ω,t) [S(ω, t)] dS(ω,t) dt dt , (5) ωi (ω, t) = P (ω, t)
100
where S(ω, t) represents the short term Fourier transform of periodic signal x(t) windowed at t. Consequently, it is sufficient to prove position independence of the sum of the numerators of Eq. 5 for periodic signal x(t) defined by Eq. 1. 3.3. Position independence of numerator The first term of the numerator of Eq. 5 yields the following: – » dS(ω, t) = [S(ω, t)][Sd (ω, t)] [S(ω, t)] dt = W (ω)Wd (ω) + α cos(ω0 t + β) (W (ω)Wd (ω − ω0 ) + Wd (ω)W (ω − ω0 )) + α2 Wd (ω − ω0 )W (ω − ω0 ) cos2 (ω0 t + β).
200
300
400 500 600 Frequency (Hz)
700
800
900
1000
Fig. 1. Instantaneous frequency of multiple component signal: (thin line) single window and (thick line) proposed method.
4. EFFECTS OF WINDOW PARAMETERS Time windowing functions with finite support are not bounded in the frequency domain. In other words, the assumption in the previous sections does not strictly hold. This deviation from the derivation in the previous section leads to temporal variations. The proposed method was implemented using Matlab and tested under several windowing conditions. In this section, the following test signals are used: M X
cos(2πkf0 t + ϕk ),
(9)
k=1
(6)
(7)
Using sin2 θ + cos2 θ = 1, the denominator of Eq. 5 yields the following: – » – » dS(ω, t) dS(ω, t) − [S(ω, t)] [S(ω, t)] dt dt = W (ω)Wd (ω) + α2 W (ω − ω0 )Wd (ω − ω0 ) + α cos(ω0 t + β) (W (ω)Wd (ω − ω0 ) + Wd (ω)W (ω − ω0 )),
100
xtest (t, σ) = σn(t) + 0.5 +
The second term yields the following: – » dS(ω, t) = −[S(ω, t)][Sd (ω, t)] − [S(ω, t)] dt = α2 W (ω − ω0 )Wd (ω − ω0 ) sin2 (ω0 t + β).
0 0
where ϕk represents the random initial phase (uniform distribution in [0, 2π] region), n(t) represents white Gaussian noise, and σ represents a parameter for controling S/N. 4.1. Example Figure 1 shows an analysis example. The test signal is generated using 44,100 Hz sampling and 40 dB S/N. Fundamental frequency f0 is 200 Hz. A Blackman window with window length 3T0 is used. The thick line represents the extracted instantaneous frequency using the proposed method. In spite of the fact that 22 lines were overlaid, they are virtually the same and look like a single line. The thin lines represent the results using a single time window. Abrupt changes in them are observed between the harmonic components.
(8)
where Sd (ω, t) represents the short term Fourier transform using differentiated windowing function wd (t) = dw(t) The spectral repredt sentation of wd (t) is Wd (ω, t). The third term represents the temporally varying component. By substituting t − T40 and t + T40 with t1 and t2 in Eq. 4, this time varying component is canceled. (ωAi (ω, t − T40 , t + T40 ) is called the TANDEM instantaneous frequency and represented by ωT i (ω, t) afterwards.)
5421
4.2. Performance measure Temporal variation DT2 as a function of window length Tw is evaluated using the following equation: DT2 =
1 2ωW N
Z
ωU ωL
˛ N ˛ X ˛ ωT i (ω, t(k)) − ωT i (ω) ˛2 ˛ ˛ dω, ˛ ˛ ω0 k=1
(10)
Blackman S/N:40 (dB)
2
1
RMS deviation (%)
RMS deviation (%)
1
10
0
10
0
10
10
−1
10
−1
10
Blackman S/N:40 (dB)
2
10
10
1
1.5
2
2.5
3 3.5 4 Window length (*T0)
4.5
5
5.5
6
Fig. 2. Temporal and total variations as function of window length: (thin line) single window and (thick line) proposed method. Solid lines represent temporal variations, and dashed lines represent total variations.
where ωL = ω0 − ωW and ωU = ω0 + ωW define the evaluation frequency band. The averaged TANDEM instantaneous frequency of each frequency is represented as ωT i (ω). 2 as a function of evaluation bandwidth ωW Total variation DW is evaluated using the following equation:
2 DW =
1 2ωW N
Z
ωU ωL
˛ N ˛ X ˛ ωT i (ω, t(k)) − ωT i ˛2 ˛ ˛ dω, ˛ ˛ ω0
(11)
k=1
where ωT i represents the average instantaneous frequency in the time-frequency region used for the evaluation. The only difference between Eqs. 10 and 11 is the average used to calculate the deviations.
0.2
0.25
0.3 0.35 0.4 Evlauation bandwidth (*f0)
0.45
0.5
Fig. 3. Temporal and total variations as function of evaluation bandwidth: (thin line) single window and (thick line) proposed method. Solid lines represent temporal variations, and dashed lines represent total variations.
5. APPLICATION TO F0 EXTRACTION The behavior of the proposed method introduced in the previous section is ideal for F0 extractor design. The fixed-points of the mapping from the center frequency to the instantaneous frequency of a set of band-pass filter outputs provide frequencies of harmonic components [4, 5]. The average interval between these fixed-point instantaneous frequencies provides the initial F0 estimate. 5.1. Fixed-points of mapping and initial estimate A set of fixed-points of mapping from frequency ω to instantaneous frequency ωi (ω) is defined by the following equation: j ˛ ff ˛ dωT i (ω) Λ = ω ˛˛ ωT i (ω) = ω,