proportionate-type nlms algorithms based on ... - IEEE Xplore

Report 0 Downloads 69 Views
PROPORTIONATE-TYPE NLMS ALGORITHMS BASED ON MAXIMIZATION OF THE JOINT CONDITIONAL PDF FOR THE WEIGHT DEVIATION VECTOR Kevin T. Wagner

Miloˇs I. Doroslovaˇcki

Naval Research Laboratory Radar Division Washington, DC 20375, USA

The George Washington University Department of Electrical and Computer Engineering Washington, DC 20052, USA

ABSTRACT In this paper, we present a proportionate-type normalized least mean square algorithm which operates by choosing adaptive gains at each time step in a manner designed to maximize the joint conditional probability that the next-step coefficient estimates reach their optimal values. We compare and show that the performance of the joint maximum conditional probability density function (PDF) onestep algorithm is superior to the proportionate normalized least mean square algorithm when operating on a sparse impulse response. We also show that the new algorithm is superior to a previously introduced algorithm which assumed that the conditional PDF could be represented by the product of the marginal conditional PDFs, i.e., that the weight deviations are mutually conditionally independent. Index Terms — Adaptive filtering, convergence, proportionatetype normalized least mean square (PtNLMS) algorithm, sparse impulse response.

Table I PtNLMS Algorithm with Time-varying Stepsize Matrix x(k) = [x(k)x(k − 1) . . . x(k − L + 1)]T yˆ(k) = xT (k)w(k) ˆ e(k) = d(k) − yˆ(k) F [|w ˆl (k)|] = Specified by the user γmin (k + 1) = ρ max{δp , F [|w ˆ1 (k)|], . . . , F [|w ˆL (k)|]} γl (k + 1) = max{γmin (k + 1), F [|w ˆl (k)|]} γl (k+1) gl (k + 1) = 1 L γ (k+1) G(k + 1) w(k ˆ + 1)

L

2

i=1

i

diag{g1 (k + 1), . . . , gL (k + 1)} w(k) ˆ + xTβG(k+1)x(k)e(k) (k)G(k+1)x(k)+δ

= =

G(k + 1) = diag {g1 (k + 1), . . . , gL (k + 1)} is the time-varying stepsize control matrix. The constant δ is typically a small positive number used to avoid overflowing. Some common examples of the term F [|w ˆl (k)|] are F [|w ˆl (k)|] = 1 and F [|w ˆl (k)|] = |w ˆl (k)|, which results in the normalized least mean square algorithm (NLMS) and the proportionate NLMS algorithm (PNLMS), respectively.

1. PTNLMS ALGORITHM FRAMEWORK A joint conditional probability density function (PDF) constrained maximization algorithm was originally introduced in [1]. This algorithm relied on the unrealistic assumption that the joint conditional PDF could be approximated by the product of the marginal conditional PDFs. We remove this assumption and generate a joint conditional PDF constrained maximization algorithm for the more general case of mutually conditionally dependent weight deviations. We begin by introducing proportionate-type normalized least mean square (PtNLMS) algorithms. Let us assume there is some input signal denoted as x(k) for time k that excites an unknown system with impulse response w. Let the output of the system be y(k) = wT x(k) where x(k) = [x(k), x(k−1), . . . , x(k−L+1)]T , and L is the length of the filter. The measured output of the system, d(k), contains zero-mean stationary measurement noise v(k) and is equal to the sum of y(k) and v(k). The impulse response of the system is estimated with the adaptive filter coefficient vector, w(k). ˆ The output of the adaptive filter is given by yˆ(k) = w ˆ T x(k). The error signal e(k) between the output of the adaptive filter yˆ(k) and d(k) drives the adaptive algorithm. The weight deviation (WD) vector is given by z(k) = w − w(k). ˆ The PtNLMS algorithm is shown in Table I. Here, β is the fixed stepsize parameter. The term F [|w ˆl (k)|] governs how each coefficient is updated when F [|w ˆl (k)|] is not small. In the case when F [|w ˆl (k)|] is small, the quantity γmin is used to set the minimum gain a coefficient can receive. The constant δp is important in the beginning of the algorithm when all of the coefficients are zero and together with ρ prevents the very small coefficients from stalling.

978-1-4244-4296-6/10/$25.00 ©2010 IEEE

3738

2. RECURSIVE WEIGHT DEVIATION EQUATIONS Employing the convention that xi (k) and zi (k) are the ith components of vectors x(k) and z(k) respectively, we can express the recursion for the weight deviation in component-wise form as follows: zi (k + 1)

=

zi (k) −



2

2

βgi (k + 1)xi (k) L j=1

2

L j=1

x2j (k)gj (k

xj (k)zj (k)

+ 1) + δ

βgi (k + 1)xi (k)v(k) . L 2 j=1 xj (k)gj (k + 1) + δ

(1)

Before proceeding we employ the following assumptions: Assumption I: The input signal is a white Gaussian noise process with zero-mean and variance σx2 . Assumption II: β is sufficiently small such that zi (k) fluctuates much slower than x(k) and therefore they can be considered independent.

2

Assumption III: The denominator terms, L 2 j=1 xj (k)gj (k + 1) + δ, are assumed to be constant. (The validity of this assumption was discussed in [2].) Assumption IV: The measurement noise v(k) is white with zeromean, variance σv2 , and it is independent of the input. At this point we attempt to find the optimal gain at each time step k by optimizing our user defined criterion with respect to the gain.

ICASSP 2010

h

3. OPTIMAL GAIN CALCULATION RESULTING FROM MAXIMIZATION OF CONDITIONAL PDF

We have made the following definitions a = 2/ (2π)

One method for choosing the optimal gains is to maximize the conditional probability density function that z(k + 1) = 0 given z(k). Additionally we have the constraints that L i=1 gi (k + 1) = L and gi (k + 1) ≥ 0 for all i and k. For notational simplicity we denote z(k +1) by z+ and drop the time indexing on all terms. For example g(k+1) = g, v(k) = v, and xi (k) = xi . We state this optimization problem as

2

X

max Λ(s, λ). 2 maxs =L ln f (0|z) = (s,λ)∈{stationary points of Λ} L i=1

s,

2



(2)

= zi − β0 gi xi

2

∂si

vuuX t

X

xj zj − β0 gi xi v.

(3)

where

L

χ=

f (z |z) =

X L

(2π)

2

L i=1

L

zi+ zi

vuuX t L

K L−1 2

i=1

+

(zi+ − zi )2 + σv2 σx2 β02 gi2

gi

L

i=1

2 − L−1 4

zi+ zi − zi2 σv2 β0 gi

λ=

2

(4)

P

P

X X X X X   vuuX  X  ! X t

= ln a −

ln s2i

i=1

L−1 ln − 4

L

+

i=1

bi σv2 s4i i=1

L

+

i=1

L

+ ln K L−1 2

bi 4 2 s σv i i=1

ci si2 L

+

2

ci 2 2 s σv i i=1

2

L



L+1+

s2i

=

2 666 4

2 2 σv

2

.

≈ 1.

P

L ci i=1 s2 i

+ 2χσv2

1 + β0 σx σv

"X L

i=1

P P P  3 P 777 ,  P  75 vuuX  X  3 t 5 zi2 s4 i

L−1 zi2 +1− β0 σv2 s2i 2

sP

zi2 s4 i

+

zi2 L i=1 s4 i

2 2 σx zi 2 s2 σv i

+

+

zi2 L i=1 s4 i

2 2 σx zi 2 s2 σv i

2

σx zi2 L i=1 σv s2 i

L

i=1

zi2 L i=1 s2 i

σx zi2 L i=1 σv s2 i

+

zi2 L i=1 s2 i

zi2 L+1 1 + + β0 σv2 s2i 2 β0 σx σv

2

(9)

L

zi2 + s4i

L

i=1

σx zi2 σv s2i

2

.

3.1. High SNR Algorithm

s2i =

ci s2i σv2

L

i=1

ci σv2 s2i

If we take the limit for σv2 → 0 of (9), this reduces to

− L)

i=1

L

L

. (8) 2L Then we substitute this value of λ into (7) and with some rearrangement of terms we find that

×

and order The details of this derivation can be found in the Appendix. Our goal is to find the gains g that maximize f (0|z) under the constraint that gi ≥ 0 ∀i and L i=1 gi = L. We introduce the following guidelines in order to facilitate the solution of this problem. 1. Let gi = s2i and maximize with respect to si (real) to ensure gi ≥ 0. 2. Introduce the Lagrange multiplier λ to incorporate the constraint L i=1 gi = L. 3. Use the natural logarithm, ln(·), to aid in finding the solution. We begin by defining the Lagrangian as Λ(s, λ) = ln f (0|z) + λ

+ 2λsi (7)

Next we set ∂Λ(s, λ)/∂si = 0, multiply (7) by si , and take the sum over i = 1, . . . , L to find λ in terms of si as given by,

2

(s2i

L ci 2 s2 i=1 σv i

2

where K L−1 is the modified Bessel function of the second kind

L

χ2 σv4

X 

K L−1 (χ)

×

− zi2 β0 gi

i=1

L−1 . 2

L ci i=1 s2 i

χ

∂si

Q  X ! X

σv σxL β0L

σv2 (zi+ − zi )2 β02 σx2 gi2 i=1

P

2

+ zi (z −zi ) L i i=1 2 β0 gi σv

L+1 2

ci s3 i

∂K L−1 (χ)

At this point we define the joint PDF of f (z |z) as −

+

At this point we simplify this derivative by assuming χ is large enough such that we can make the approximation [3]

+

2e

P

ci 2 s3 σv i

+

bi + σv2 s4i

i=1

j=1

+

bi 2 s5 σv i

2

L

zi+

∂K L−1 (χ)

K L−1 (χ)

i=1

P

2 bi σv s5 i

(L − 1) ∂Λ(s, λ) 2 2ci =− − 2 3 + ∂si si σv si

L

In order to make this calculation we need z+ in terms of z which was given by (1). We can apply Assumption III and set β0 = 2 β/(σx2 L + δ) ≈ β/( L j=1 xj gj + δ). After making this approxi+ mation the recursion for zi becomes

(6)

2 i

To solve this problem we take the derivative of ln Λ(s, λ) with respect to s and set it to zero. This derivative is given by

s.t. gi ≥ 0 gi = L.

i

σv σxL β0L ,

bi = zi2 /(σx2 β02 ), and ci = zi2 /β0 . Incorporating these guidelines and employing the Lagrangian, results in the following optimization problem

max f (z+ = 0|z) g

L+1 2

(5)

(s2i − L).

i=1

3739

P

|zi |

1 L

L i=1

|zi |

.

(10)

Hence, for large signal to noise ratios this tells us that the optimal gain is related to the absolute value of the weight deviation, which agrees nicely with intuition. Another alternative approach consists of finding the zeros of (9) using the Newton-Raphson Method. As it turns out in our numerical experiments, both approaches result in similar Mean Square Error (MSE) performance.

MSE vs. ITERATION 0

−10

High SNR Conditional PDF Maximization

−15

Conditional PDF Maximization [1]

−20

PNLMS

−25

β = 0.1 ρ = 0.01 δ = 0.0001 δp = 0.01

0.2 0.1 0 −0.1 −0.2

−30

−0.3

−35

−0.4

Water−filling

−40 −45

0.3

AMPLITUDE

−5

MSE dB

IMPULSE RESPONSE

PNLMS Water−filling Conditional PDF Maximization [1] High SNR Conditional PDF Maximization Newton−Raphson Conditional PDF Maximization

Newton−Raphson Conditional PDF Maximization 0

0.5

1

1.5 ITERATIONS

2

2.5

50 4

x 10

Fig. 1. Algorithm Comparison.

The conditional PDF maximization we have derived is not feasible at this stage. We still need to estimate |zi (k)| = |wi − w ˆi (k)| which requires the knowledge of the optimal impulse response. We can make the following approximation to overcome this lack of knowledge, |zi (k)| ≈ |zˆi (k)| where we replace the weight deviation with an estimate of the mean weight deviation. In order to find zˆi (k) we rewrite the error as

: L

xj (k)zj (k) + v(k).

(11)

j=1

Next we multiply both sides of the equation by xi (k) and take the expectation (assuming xi (k) is a white signal). This results in E{xi (k)e(k)} = σx2 E{zi (k)}.

(12)

If we define pi (k) = xi (k)e(k), we can calculate pi (k) = E{pi (k)} and then solve for z¯i (k) = E{zi (k)}. We update our estimate of pi (k) in the following fashion: ˆi (k) p

=

zˆi (k)

=

ˆi (k − 1) + (1 − α)pi (k) αp ˆi (k) p σx2

150

200 250 300 COEFFICIENT

350

400

450

500

Fig. 2. Impulse Response.

3.2. Joint Conditional PDF Maximization Implementation

e(k) =

100

σx2 = 1, σv2 = 10−4 , ν = 1000 [5], α = 0.99 [5], L = 512. The impulse response used is shown in Figure 2. The Newton-Raphson and σv2 → 0 joint conditional PDF maximization algorithms have nearly identical performance. The new joint conditional PDF maximization algorithms outperform all algorithms with the exception of the water-filling algorithm. However, in steady-state the joint conditional PDF maximization algorithms have superior performance relative to the water-filling algorithm. Additionally we see that the joint conditional PDF maximization algorithm outperforms the conditional PDF algorithm introduced in [1].

2

2 2 In Figure 3 we plot an example of L i=1 |R.H.S. − si | for the function given by (9), where L = 2. The weight deviation of this function is given by z = [1, 2]T . We observe that the minimum of the function practically corresponds to g = [2/3, 4/3]T which agrees with the gain calculated using the high SNR algorithm.

Finally, in Figure 4 we compare the PNLMS algorithm to the biased feasible joint conditional PDF maximization algorithm for SNRs of -20 dB, -40 dB, and -60 dB. For small SNR both algorithms have similar convergence curves. As the SNR is increased the conditional PDF maximization algorithm outperforms the PNLMS algorithm significantly.

(13) 5. CONCLUSIONS

where 0 ≤ α ≤ 1. Note that the mean of |zi (k)| is E{|zi (k)|}, which in turn we approximate with |E{zi (k)}|. This last approximation introduces a bias. Estimate (13) was used in [4]. 4. RESULTS In Figure 1 we compare MSE of the PNLMS, the conditional PDF maximization algorithm [1], water-filling [5], and two new joint conditional PDF maximization algorithms, one based on the approximate solution when σv2 → 0 and the other based on the NewtonRaphson numerical solution of (9). The parameters used in these simulations were β = 0.1, ρ = 0.01, δ = 10−4 , δp = 0.01,

3740

We have introduced the joint conditional PDF maximization algorithm which improves the conditional PDF maximization algorithm introduced in [1] by removing the unrealistic independence assumption for the weight deviations. Two implementations of the joint conditional PDF maximization algorithm were introduced and both have superior MSE performance relative to the PNLMS and conditional PDF maximization algorithms [1], that assumes independence of the weight deviations. The maximization of the joint conditional PDF for zero-weight deviations decreases the MSE (which confirms the general intuition) by using a relatively simple calculation procedure for the gain vector.

LEARNING CURVE COMPARISON 8

10

7

0 z = [1; 2] β = 0.1 δ = 0.01 σ2x = 1 σ2 = 0.0001

2 2

−20

n

4

1

1

2

5

PNLMS SNR = −20 dB Conditional SNR = −20 dB PNLMS SNR = −40 dB Conditional SNR = −40 dB Conditional PNLMS SNR = −60 dB Conditional SNR = −60 dB SNR = −20 dB

−10

MSE dB

2

6

|RHS − s2| + |RHS − s2|

β = 0.1 ρ = 0.01 δ = 0.0001 δ = 0.01 p

3

PNLMS SNR = −40 dB

−30 −40

Conditional SNR = −40 dB

2 −50

Optimal Gain = [ 2/3; 4/3]

Conditional SNR = −60 dB

1 −60 0

PNLMS SNR = −20 dB

0

0.5

1

1.5

2

0

0.5

1

1.5

2 2.5 3 ITERATIONS

s21

PNLMS SNR = −60 dB 3.5

4

4.5 4

x 10

Fig. 3. Example of Sum of Square Equation Errors for (9) where L = 2.

Fig. 4. PNLMS vs. Conditional One Step PDF Maximization Algorithm as a Function of SNR.

6. APPENDIX

At this point if we substitute τ = eu and after some algebraic manipulation we find that

We begin by rearranging (3) as zi+

zi −

X L

= β0 gi xi (

xj zj + v).

(14)

P

j=1

L Next we define yi = (zi − zi+ )/(β0 gi ) and t = j=1 xj zj + v. This yields yi = xi t. Note that z is assumed to be a constant vector independent of x and v. We know that the distribution of t is Gaussian since it is the sum of Gaussian random variables, i.e. 2 2 t ∼ N (0, σt2 ), where we define σt2 = σx2 L j=1 zj + σv . Similarly, 2 xi is Gaussian with distribution N (0, σx ). Let us define the joint PDF of random variable y = [y1 , y2 , · · · , yL ]T as fy ( ) and the probability density function of t as ft (τ ). Next we can express the PDF of y in terms of the joint PDF of (x, t) as follows:

P

Z

fy ( )=



fy ( |t = τ )ft (τ )dτ=

−∞

1 |τ L |

Z



fx,t ( /τ, τ )dτ. (15)

−∞

As it turns out (x, t) are jointly Gaussian with PDF given by fx,t (ξ, τ ) =

1

−1

1

[(2π)L+1 det(K)] 2

e−[ξ,τ ]K

[ξ,τ ]T

where K is the covariance matrix of [x, t]T and det(K) is the determinant of the matrix K. The determinant of the covariance matrix is given by det(K) = σv2 σx2L while the inverse of the covariance matrix can be calculated as 1 zzT − σz2 2 I + σ2 σx v v K−1 = T 1 − zσ2 2 σ

"

#

v

v

where I is the identity matrix. With this information we can represent the joint PDF of y as fy ( )

= ×

Z 0

2e (2π) ∞

zT ν 2 σv

L+1 2

σv σxL 1  T  + 1 ( T z)2 ] 1 + 1 τ 2 } −1 {[ 2 2 2 2 2

e

σx

σv

τ

σv

dτ . (16) τL

3741

fy ( )

= ×

2e (2π) 2

r

L+1 2

K L−1



zT ν 2 σv

σv σxL

σv2 T   + ( T z)2 σx2



− L−1 4



1 1 1 T [   + 2 ( T z)2 ] . σv2 σx2 σv

(17)

Now, we recall that zi+ = zi − β0 gi yi , which yields (4). 7. REFERENCES [1] K. Wagner and M. Doroslovaˇcki, “Proportional-type NLMS Algorithm with Gain Allocation Providing Maximum One-step Conditional PDF for True Weights,” in Conference on Information Sciences and Systems, Baltimore, Maryland, March 18-20, 2009, pp. 61-66. [2] M. Doroslovaˇcki and H. Deng, “On convergence of proportionate-type NLMS adaptive algorithms,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 2006, vol. 3, pp. 105-108. [3] M. Abramowitz and I. Stegun, I. (Eds.). “Modified Bessel Functions I and K,” in Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing, New York: Dover, 1972. [4] N. Li, Y. Zhang, Y. Hao, and J. A. Chambers, “A new variable step-size NLMS algorithm designed for applications with exponential decay impulse responses,” Signal Processing, vol. 88, no.9, pp. 2346-2349, Sept. 2008. [5] K. Wagner and M. Doroslovaˇcki, “Gain Allocation in Proportionate-type NLMS Algorithms for Fast Decay of Output Error at all Times,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, April 19-24, 2009.