A General Class of Nonlinear Normalized Adaptive Filtering Algorithms 1 Sudhakar Kalluri and Gonzalo R. Arce
Department of Electrical and Computer Engineering University of Delaware Newark, Delaware 19716 phone: (302) 831-8030 fax: (302) 831-4316 e-mail:
[email protected] and
[email protected] Abstract
The Normalized Least Mean Square (NLMS) algorithm is an important variant of the classical LMS algorithm for adaptive linear ltering. It possesses many advantages over the LMS algorithm, including having a faster convergence and providing for an automatic time-varying choice of the LMS step-size parameter which aects the stability, steady-state mean square error (MSE) and convergence speed of the algorithm. An auxiliary xed step-size that is often introduced in the NLMS algorithm has the advantage that its stability region (step-size range for algorithm stability) is independent of the signal statistics. In this paper, we generalize the NLMS algorithm by deriving a class of Nonlinear Normalized LMS-type (NLMS-type) Algorithms that are applicable to a wide variety of nonlinear lter structures. We obtain a general nonlinear NLMS-type algorithm by choosing an optimal time-varying step-size which minimizes the next-step MSE at each iteration of the general nonlinear LMS-type algorithm. As in the linear case, we introduce a dimensionless auxiliary step-size whose stability range is independent of the signal statistics. The stability region could therefore be determined empirically for any given nonlinear lter type. We present computer simulations of these algorithms for two speci c nonlinear lter structures: Volterra Filters, and the recently proposed class of Myriad Filters. These simulations indicate that the NLMS-type algorithms, in general, converge faster than their LMS-type counterparts. Submitted to the IEEE Transactions on Signal Processing
EDICS Paper Category: SP 2.1.5 (Rank-Order and Median Filters) or SP 2.7.3 (Nonlinear Filters) Permission to publish this abstract separately is granted 1 This research was funded in part by the National Science Foundation under Grant MIP-9530923.
1 Introduction The Least Mean Square (LMS) algorithm [1] is widely used for adapting the weights of a linear FIR lter that minimizes the mean square error (MSE) between the lter output and 4 a desired signal. Consider an input (observation) vector of N samples, x = [x ; x ; : : : ; xN ]T , 4 and a weight vector of N weights, w = [w ; w ; : : : ; wN ]T . Denote the linear lter output by y y(w; x) = wT x. The ltering error, in estimating a desired signal d, is then e =4 y ? d: Under the mean square error (MSE) criterion, the optimal lter weights minimize the cost 4 function J (w) = E fe g, where E fg denotes statistical expectation. In an environment of unknown or time-varying signal statistics, the standard LMS algorithm [1] continually attempts to reduce the MSE by updating the weight vector, at each time instant n, as 1
1
2
2
2
w(n + 1) = w(n) ? e(n) x(n);
(1)
where > 0 is the so-called step-size of the update. The computational simplicity of the LMS algorithm has made it an attractive choice for several applications in linear signal processing, including noise cancellation, channel equalization, adaptive control, and system identi cation [1]. However, it suers from a slow rate of convergence. Further, its implementation requires the choice of an appropriate value for the step-size which aects the stability, steadystate MSE and convergence speed of the algorithm. The stability (convergence) of the LMS algorithm has been extensively studied in the literature [1]. The stability region for meansquare convergence of the LMS algorithm is given by 0 < < (2 = trace(R)) [1], where R =4 E fx(n)xT (n)g is the correlation matrix of the input vector x(n). When the input signal statistics are unknown or the environment is nonstationary, it is dicult to choose a step-size that is guaranteed to lie within the stability region. The so-called Normalized LMS (NLMS) algorithm [1] addresses the problem of stepsize design by choosing a time-varying step-size (n) in (1) such that the next-step MSE, Jn =4 E fe (n + 1)g, is minimized at each iteration. This algorithm can be developed from several dierent points of view; we shall focus on the criterion of minimization of the +1
2
1
next-step MSE (see [1], Problem 14), since this will be the most convenient interpretation when we later consider the case of a general nonlinear lter. The step-size that minimizes 4 PN the next-step MSE is given by (n) (1 = kx(n)k ), where kx(n)k = i xi (n) is the squared Euclidean norm of the input vector x(n). After incorporating an auxiliary step-size ~ > 0, the NLMS algorithm is written as 2
2
w(n + 1) = w(n) ? kx(~n)k e(n) x(n): 2
=1
2
(2)
The theoretical bounds on the stability of the NLMS algorithm are given by 0 < ~ < 2 [1]. A signi cant advantage here is that, unlike the LMS step-size of (1), the auxiliary step-size ~ is dimensionless and the stability region for ~ is independent of the signal statistics. This allows for an easier step-size design with guaranteed stability (convergence) of the algorithm. Further, the NLMS algorithm has a potentially faster convergence than the LMS algorithm [2]. The NLMS algorithm can also be alternatively interpreted as a modi cation of the LMS algorithm of (1), where the update term is divided (normalized) by the squared-norm kx(n)k so that the update stays bounded even when the input vector x(n) becomes large in magnitude. In this paper, we generalize the NLMS algorithm of (2) by deriving a class of nonlinear Normalized LMS-type (NLMS-type) algorithms [3] that are applicable to a wide variety of nonlinear lter structures. Although linear lters are useful in a number of applications, several practical situations exist in which nonlinear processing of the signals involved is essential in order to maintain an acceptable level of performance. Applications of nonlinear models and ltering include polynomial (Volterra) lters used in nonlinear channel equalization and system identi cation [4, 5, 6], and the class of order statistic lters used in image processing [7, 8]. Several adaptive nonlinear lters have also been developed based on Volterra and order statistic lters [9, 10, 11]. Consider now the case of an arbitrary nonlinear lter whose output is given by y y(w; x). Under the MSE criterion, the LMS algorithm of (1) can be generalized to yield the following class of nonlinear LMS-type adaptive algorithms (see 2
2
Section 2):
@y (n); i = 1; 2; : : : ; N: wi(n + 1) = wi(n) ? e(n) @w
(3)
i
@y (n) exist. Note that (3) can be applied to any nonlinear lter for which the derivatives @w The above algorithm inherits the main problem of the LMS algorithm of (1), namely, the diculty in choosing the step-size > 0. Unlike the linear case where step-size bounds are available, no theoretical analysis of the LMS-type algorithm of (3) has been performed to derive the stability range for . This is due to the mathematical complexity inherent in most nonlinear lters. Consequently, there is a strong motivation (much more than in the linear case) to develop automatic step-size choices to guarantee stability of the LMS-type algorithm. The class of nonlinear Normalized LMS-type algorithms developed in this paper addresses this problem of step-size design. Just as the linear NLMS algorithm of (2) is developed from the classical LMS algorithm of (1), we obtain a general nonlinear NLMS-type algorithm from the LMS-type algorithm of (3), by choosing a time-varying step-size which minimizes the next-step MSE at each iteration. As in the linear case, we introduce a dimensionless auxiliary step-size whose stability range has the advantage of being independent of the signal statistics. The stability region could therefore be determined empirically for any given nonlinear lter. We show through computer simulations that these NLMS-type algorithms have, in general, a faster convergence than their LMS-type counterparts. As one of our anonymous reviewers pointed out, normalization has been employed as a heuristic method for stability, in the areas of numerical analysis and optimization. However, it has never been used in the eld of nonlinear adaptive signal processing. We have introduced this valuable tool for the rst time into the area of nonlinear signal processing, and also provided a theoretical basis for this technique. The paper is organized as follows. Section 2 presents LMS-type nonlinear adaptive algorithms. The class of nonlinear Normalized LMS-type (NLMS-type) algorithms is derived in Section 3. In Section 4, we apply the NLMS-type algorithms to two speci c nonlinear lter structures: the well-known polynomial (Volterra) lters, and the recently proposed class i
3
of Myriad lters. The performance of these algorithms is demonstrated through computer simulations presented in Section 5.
2 Nonlinear LMS-type Adaptive Algorithms In this section, we brie y review the derivation of nonlinear LMS-type adaptive algorithms that have been used in the literature for the optimization of several types of nonlinear lters. Consider a general nonlinear lter with the lter output given by y y(w; x), where x and w are the N -long input and weight vectors, respectively. The optimal lter weights minimize the mean square error (MSE) cost function
J (w) = E fe g = E f(y(w; x) ? d) g; 2
2
(4)
where d is the desired signal and e = y ? d is the ltering error. The necessary conditions for lter optimality are obtained by setting the gradient of the cost function equal to zero:
@J (w) = 2 E e @y = 0; i = 1; 2; : : : ; N: (5) @wi @wi Due to the nonlinear nature of y(w; x), and consequently of the equations in (5), it is (
)
extremely dicult to solve for the optimal weights in closed-form. The method of steepest descent is a popular technique which addresses this problem by updating the lter weights using the following equation, in an attempt to continually reduce the MSE cost function:
@J (n); i = 1; 2; : : : ; N; (6) wi(n + 1) = wi(n) ? 21 @w i where wi(n) is the ith weight at iteration n, > 0 is the step-size of the update, and the gradient at the nth iteration is given from (5) as @J (n) = 2 E e(n) @y (n) ; i = 1; 2; : : : ; N: (7) @wi @wi (
)
In order to utilize (6), we require a knowledge of the statistics of the signals involved. When the signal statistics are either unknown or rapidly changing (as in a nonstationary @J (n). To this end, removing environment), we use instantaneous estimates for the gradient @w i
4
the expectation operator in (7) and substituting into (6), we obtain the following class of nonlinear LMS-type adaptive algorithms:
@y (n); i = 1; 2; : : : ; N: wi(n + 1) = wi(n) ? e(n) @w i
(8)
@y = x , and (8) reduces as expected to Note that for a linear lter (y = wT x), we have @w i the LMS algorithm of (1). As mentioned in Section 1, the mathematical complexity inherent in nonlinear ltering has prevented a theoretical analysis of (8) in order to determine the bounds on the step-size for stable operation of the algorithm. There is therefore a strong motivation for the development of automatic step-size choices that guarantee the stability of the LMS-type algorithm. The normalized LMS-type algorithms derived in the following section address this problem by choosing an optimal time-varying step-size (n) at each iteration. i
3 Nonlinear Normalized LMS-type (NLMS-type) Algorithms We derive the class of nonlinear Normalized LMS-type (NLMS-type) algorithms by choosing a time-varying step-size (n) > 0 in the LMS-type algorithm of (8). In order to do this, we start by rewriting the steepest descent algorithm of (6), using (5) to obtain
@y (n) : wi(n + 1) = wi(n) ? E e(n) @w i )
(
(9)
Now, at the nth iteration, the next-step MSE is de ned by
Jn =4 J (w(n + 1)) = E fe (n + 1)g; 2
+1
(10)
where the next-step ltering error e(n + 1) is given by
e(n + 1) = y(n + 1) ? d(n + 1) y(w(n + 1); x(n + 1)) ? d(n + 1):
(11)
Note that Jn depends on the next-step weight vector w(n + 1), which in turn is a function of > 0. We obtain the NLMS-type algorithm from (9) by determining the optimal step-size, +1
5
denoted by (n), that minimizes Jn o
+1
Jn (): +1
(n) =4 arg min Jn (): >0 o
+1
(12)
As mentioned in Section 1, the criterion of minimization of the next-step MSE is one of the several interpretations of the normalized LMS algorithm of (2) (see [1], Problem 14). We use this criterion here since, out of all the interpretations, this extends most easily to the case of a general nonlinear lter. To determine (n), we need an expression for the derivative function (@=@)Jn (). Referring to (10) and (11), we can use the chain rule to write o
+1
@ J () = @ n +1
N X j =1
@Jn () @wj (n + 1) : @wj (n + 1) @ +1
(13)
To evaluate the expressions in (13), we rst de ne the following functions for notational convenience (see (7)):
@J (n) = ? E e(n) @y (n) ; j = 1; 2; : : : ; N: j (n) = ? 21 @w @wj j (
4
)
(14)
We can then rewrite the update in (9) as
wi(n + 1) = wi(n) + i(n); i = 1; 2; : : : ; N:
(15)
Using (15), we obtain one of the terms to be evaluated in (13) as
@wj (n + 1) = (n); j = 1; 2; : : : ; N: j @
(16)
The other term in (13) can be written, using (14), as
@ @Jn () J (w(n + 1)) @J (n + 1) = ?2 j (n + 1): @wj (n + 1) @wj (n + 1) @wj +1
(17)
Returning to the derivative function in (13), we substitute (16) and (17) into (13), to obtain
@ J () = ?2 N (n + 1) (n): j j @ n j X
+1
(18)
=1
Before simplifying (18) further, we note from (15) that
= 0 ) w(n + 1) = w(n): 6
(19)
This leads to the signi cant observation that = 0 corresponds to quantities at time n, while > 0 corresponds to quantities at time (n + 1). Consequently, we notice in (18) that j (n + 1) depends on , while j (n) doesn't. To emphasize this fact, we de ne the function
@y (n + 1) ; j = 1; 2; : : : ; N: Fj () = j (n + 1) = ? E e(n + 1) @w j )
(
4
(20)
It follows that
@y (n) ; j = 1; 2; : : : ; N: Fj (0) = j (n) = ? E e(n) @w j (
)
(21)
Using (20) and (21), we have the following expression for the derivative of Jn (): +1
@ J () = ?2 N F () F (0): j j @ n j X
(22)
+1
=1
Due to the nonlinearity of the quantities in the above equation (see (20) and (21)), it is generally very dicult to simplify (22) further in closed-form. We therefore resort to an approximation by employing a rst-order Taylor's series expansion (linearization) of the functions Fj () about the point = 0, assuming a small step-size > 0: 2
@ F () Fj () Fj (0) + Fj0(0) = Fj (0) + @ j
4
3 5
:
(23)
=0
Using this approximation in (22), we obtain
@ J () ?2 @ n +1
2 4
N X
j =1
Fj (0) + 2
N X j =1
3
Fj0(0) Fj (0) :
(24)
5
Notice that this also implies a linearization of the derivative function (@=@)Jn (). This, in turn, is equivalent to approximating the next-step MSE Jn () as a quadratic function of . Under these assumptions, the optimal step-size (n) of (12) is found by setting the (approximate) derivative of Jn () to zero: +1
+1
o
+1
(n) : o
@ J () @ n +1
= o (n)
= 0:
(25)
In order to see if (25) leads to a minimum (rather than a maximum) of Jn (), we note from (22) that N @ J () = ?2 X Fj (0) < 0: (26) @ n +1
+1
2
j =1
=0
7
Thus, Jn () is (predictably) decreasing at = 0. Therefore, it is reasonable to assume that the quadratic approximation of Jn () attains its global minimum at some step-size > 0. Using (24) in (25), we then obtain the following closed-from, albeit approximate, expression for the optimal step-size: +1
+1
N X
(n) ?
j =1 N X
o
where, from (21),
j =1
Fj (0) 2
F 0(0) F j
j
(0)
;
(27)
@y (n) ; j = 1; 2; : : : ; N; Fj (0) = ? E e(n) @w j (
)
(28)
is independent of , and depends only on the signal statistics at time n. We see from (27) that our remaining task is to evaluate Fj0(0); this expression is derived in Appendix A and is given by
F 0(0) j
=?
N X k=1
y (n) + @y (n) @y (n) : Fk (0) E e(n) @w@ @w @wk @wj k j (
)
2
(29)
We can now substitute (29) and (28) into (27) and obtain an expression for the optimal stepsize (n). Note that (29) and (28) involve statistical expectations E fg. These expectations are dicult to obtain in an environment of unknown or time-varying signal statistics. We therefore resort to using instantaneous estimates of these expectations, just as in the derivation of the conventional (linear) LMS algorithm of (1) or of the nonlinear LMS-type algorithm of (8). To this end, removing the expectation operator in (29) and (28), using the resulting expressions in (27), and performing some straightforward simpli cations, we obtain the following expression for the optimal step-size: ! 1 1 (n) 1 + E (n) X (30) ! ; N @y (n) o
o
2
j =1
where
N X
E =4 e
j =1
@y @wj
"
2 4
N X
k=1 N X
j =1
8
@wj
@y @ y @wk @wk @wj : @y @wj 2
!2 32 5
#
(31)
In order to obtain a more manageable and practically useful expression for (n), we make the simplifying assumption that jE (n)j 1: (32) o
This is a justi able assumption for two reasons. Firstly, we see from (31) that E (n) is proportional to the ltering error e(n). Since the error e(n) is continually reduced in magnitude by the adaptive algorithm of (9), jE (n)j in turn becomes smaller during succeeding iterations, making (32) a progressively better approximation. Secondly, referring to the numerator of E in (31), we see that for mild nonlinearities in the lter, the approximation in (32) amounts y in relation to the rstto neglecting the eects of the second-order cross-derivatives @w@2@w @y . Using the assumption of (32) in (30), we nally obtain the following order derivatives @w simpli ed expression for the optimal step-size: k
j
j
(n) o
1
@y (n) @wj
N X j =1
!2
:
(33)
After incorporating an auxiliary step-size ~ > 0, just as in the conventional (linear) NLMS algorithm of (2), we can then write the time-varying step-size, to be used in the steepestdescent algorithm of (9), as
(n) = ~ (n) o
N X j =1
~ : @y (n) @wj !2
(34)
Finally, on using instantaneous estimates by removing the expectation operator in the steepest-descent algorithm of (9), we obtain the following class of Nonlinear Normalized LMS-type Adaptive Filtering Algorithms:
wi(n + 1) = wi(n) ?
N X j =1
~ @y (n) @wj
This algorithm has several advantages: 9
!2
@y (n); i = 1; 2; : : : ; N: e(n) @w i
(35)
It is applicable to a wide variety of nonlinear lters; in fact, to any nonlinear lter for which the lter output y is an analytic function of each of the lter weights wi (so that derivatives of all orders exist).
The auxiliary step-size ~ is dimensionless and the stability region for ~ is independent of the signal statistics. As a result, the stability region could be determined empirically for any particular nonlinear lter of interest.
This algorithm has a potentially faster convergence than its LMS-type counterpart of (8), as demonstrated by our simulation results in Section 5.
It can also be interpreted as a modi cation of the LMS-type algorithm of (8) in which the update term is divided (normalized ) by the Euclidean squared-norm of the set of @y (n); i = 1; 2; : : : ; N , in order to ensure algorithm stability when these values values @w become large in magnitude. i
It is important to note the following two approximations used in deriving the nonlinear NLMS-type algorithm of (35):
Linearization of the function Fj () de ned in (20) about the point = 0 (see (23)). This approximation holds good for any nonlinear lter, as long as the time-varying step-size (n) is suciently small. Note that this is not, at least directly, a restriction on the auxiliary step-size ~.
The assumption that jE (n)j 1 (see (31) and (32)). Note that the validity of this approximation has to be investigated on a case-by-case basis for each nonlinear lter of interest. Consider now the special case of the linear lter, for which we have y = wT x, leading to @y @w = xi ; i = 1; 2; : : : ; N . It is then easily seen that (35) reduces predictably to the (linear) NLMS algorithm of (2). A signi cant point to note here is that we do not require any of the approximations (see (23) and (32)) that were used to obtain (35); the derivation in this i
10
case is exact. Indeed, when the lter is linear, the function Fj () of (20) can be shown to be linear in , thus eliminating the need for the linearization approximation. Further, the expression E of (31) is identically equal to zero for the linear lter, making the approximation (32) unnecessary.
4 Normalized Adaptive Volterra and Myriad Filters As mentioned in Section 3, the NLMS-type adaptive algorithm of (35) is applicable to any nonlinear lter for which the lter output y is an analytic function of the lter parameters (weights) wi; i = 1; 2; : : : ; N , so that its derivatives of all orders exist. In this section, we apply this normalized adaptive algorithm to two speci c types of nonlinear lter structures, namely, Volterra and Myriad lters. The performance of the resulting algorithms will be investigated through computer simulation examples in Section 5. 4.1
Volterra Filters
The Volterra lter belongs to a particularly simple class of nonlinear lters having the property that the lter output is linear in the lter parameters (weights). Given an N 1 input (observation) vector x, the lter output in this class is given by
y = hT z = hT f (x);
(36)
where h is an M 1 vector of lter parameters, and f : RN 7! RM is a (generally nonlinear) mapping that transforms the N 1 input vector x into an M 1 modi ed observation vector z. Included in this class are linear lters, Ll lters based on order statistics [12, 13], permutation lters (which are generalizations of both linear and order statistic lters) [14], and Volterra lters [4], among others. In order to obtain the NLMS-type adaptive ltering algorithm for this class, we rst see from (36) that
@y = z ; i = 1; 2; : : : ; M: i @hi 11
(37)
Using this in (8), the LMS-type algorithm to update the M 1 parameter vector h can be written as h(n + 1) = h(n) ? e(n) z(n); (38) and the corresponding NLMS-type algorithm can be written from (35) as
h(n + 1) = h(n) ? kz(n~ )k e(n) z(n);
(39)
2
4 PM where kz(n)k = i zi (n). Notice the simplicity of the above two equations, which are similar to the (linear) LMS and NLMS algorithms of (1) and (2), respectively. However, unlike the linear case where there are N lter parameters (weights), the number of parameters M for a nonlinear lter in this class is typically signi cantly greater than the number of input samples N . Consider now the special case of the Volterra lter, which has found wide-spread use in nonlinear signal processing [6, 9]. The output of this lter is given by 2
2
=1
N X
y =4
i=1 T
w (i) xi + 1
N X N X i=1 j =1
w (i; j ) xi xj + : : : 2
= h z + hT z + : : : 1
1
2
2
= hT z;
(40)
where
h =4 [hT j hT j : : :]T 1
2
= [w (1); w (2); : : : ; w (N ) j w (1; 1); : : : ; w (N; N ) j w (1; 1; 1); : : :]T 1
1
1
2
2
3
(41)
and
z =4 [zT j zT j : : :]T 1
2
= [x ; x ; : : : ; xN j x ; x x ; : : : ; xN j : : :]T 1
2 1
2
1
2
2
(42)
are the lter parameter vector and the modi ed observation vector, respectively. Thus, the modi ed observation vector z contains all possible cross-products of the input samples fxi g. In practice, we use a truncated Volterra series, obtaining a pth order Volterra lter by using 12
p terms in the series of (40), with a parameter vector h = [hT j hT j : : : j hTp ]T and a modi ed observation vector z = [zT j zT j : : : j zTp ]T . The normalized adaptive Volterra ltering 1
1
2
2
algorithm is easily written down by substituting (40), (41) and (42) into (39), where the quantity kz(n)k in (39) is given by 2
kz(n)k = kz (n)k + kz (n)k + : : : 2
2
1
=
N X i=1
xi (n) + 2
2
2
N X N X
i=1 j =1
xi (n) xj (n) + : : : 2
(43)
2
As noted before, the algorithm of (39) is similar to the linear NLMS algorithm of (2), since the lter output is linear in the lter parameters. Therefore, it is reasonable to expect the stability region for ~ to be the same as in the linear case, where it is 0 < ~ < 2 [1]. Our simulations con rm this fact for the case of the second order Volterra lter (see Section 5). 4.2
Myriad Filters
As a second example of nonlinear NLMS-type adaptive ltering, we consider the class of Weighted Myriad Filters, which have recently been developed for robust signal processing in impulsive environments [15, 16, 17, 18]. These lters have been derived based on the properties of the heavy-tailed class of -stable distributions [19], which accurately model impulsive processes. Nonlinear LMS-type adaptive algorithms have also been derived for the optimization of these lters [17]. Given an input vector x and a vector of real-valued weights w, both of length N , the weighted myriad lter (WMyF) output is given by N X
(44) y yK (w; x) =4 arg min log K + jwij ( ? sgn(wi) xi ) ; i where sgn() is the sign function, and K is called the linearity parameter since yK reduces to the (linear) weighted mean of the input samples as K ! 1: y1 = Nj wj xj = Nj wj : For nite K , however, the lter output depends only on the N -long lter parameter vector h =4 w=K : h
2
2
i
=1
P
P
=1
=1
2
4
N X
h
i
y y(h; x) = arg min log 1 + jhij ( ? sgn(hi ) xi) ; i =1
13
2
(45)
the lter can therefore be adaptively optimized by updating the parameter vector h. Now, it can be shown [18] that @y = ? i ; i = 1; 2; : : : ; N; (46) @hi where i = ui ; i = 1; 2; : : : ; N 1 + jhij ui and N X jhj j 1 ? jhj j uj ; = j 1 + jhj j uj with ui = sgn(hi) y ? xi; i = 1; 2; : : : ; N: Substituting (46) into (8) and (35), we obtain the following LMS-type weighted myriad ltering algorithm: 2
2
2
2
=1
2
i(n) ; i = 1; 2; : : : ; N ; hi(n + 1) = hi(n) + e(n) ( n)
(47)
the corresponding normalized adaptive (NLMS-type) algorithm is given by
hi(n + 1) = hi(n) + ~ e(n) N(n) i(n); i = 1; 2; : : : ; N: j (n) X
(48)
2
j =1
5 Simulation Results The normalized adaptive algorithms of Section 4 are investigated in this section through two computer simulation examples. In the rst example, the Weighted Myriad Filter is applied to the problem of adaptive highpass ltering of a sum of sinusoids in an impulsive noise environment. The second example involves identi cation of a nonlinear system using measurements of its input and output in a noiseless environment, with the input-output relationship modeled by a truncated Volterra series.
Example 1: The normalized adaptive weighted myriad ltering algorithm of Section 4.2 was used to extract the high-frequency tone from a sum of three sinusoidal signals corrupted by impulsive noise. The observed signal was given by x(n) = s(n)+v(n), where s(n) = Pk ak sin(2fk n) 2
14
=0
1
s(n)
0.5 0 −0.5 −1 0
50
100
150 n
200
250
300
50
100
150 n
200
250
300
0.4
d(n)
0.2 0 −0.2 −0.4 0
Figure 1: Clean sum of sinusoids s(n) (top), and desired highpass component d(n) (bottom). is the clean sum of sinusoids. Fig. 1 shows a segment of the signal s(n), consisting of sinusoids at digital frequencies f = 0:008; f = 0:012; and f = 0:2, with amplitudes (a ; a ; a ) = (0:4; 0:3; 0:3). The desired signal d(n), also shown in the gure, is the sinusoid at the highest frequency, f . The additive noise process v(n) was chosen to have a zero-mean symmetric -stable distribution [19] with a characteristic exponent = 1:6 and a dispersion
= 0:02. Impulsive noise is well-modeled by the heavy-tailed class of -stable distributions, which includes the Gaussian distribution as the special case when = 2. The characteristic exponent (0 < 2) measures the heaviness of the tails (a smaller indicates heavier tails), while the dispersion decides the spread of the distribution around the origin. These distributions have in nite variance for 0 < < 2. The dispersion is analogous to the variance of a distribution; when = 2, is equal to half the variance of the Gaussian distribution. Fig. 2 shows the additive -stable noise signal v(n) used in our simulations. As the gure shows, the chosen noise process simulates low-level Gaussian-type noise as well as impulsive interference consisting of occasional spikes in the signal. The LMS-type algorithm of (47) and the normalized LMS-type algorithm of (48) in 0
0
1
1
2
2
15
2
1 0.8 0.6 0.4
v(n)
0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0
200
400
600
800
1000
n
Figure 2: Additive -stable noise signal v(n) (characteristic exponent = 1:6, dispersion
= 0:02). Section 4.2 were used to train a weighted myriad lter to extract the desired highpass signal d(n) from the noisy observed signal x(n). The lter window length was chosen to be N = 9. In all cases, the initial weights in the adaptive algorithms were all identical and normalized to sum to unity: wi(0) = (1=N ) = 0:11; i = 1; 2; : : : ; N . The linearity parameter was arbitrarily chosen to be K = 1:0 ; recall from Section 4.2 that the lter output depends only on (w=K ). A step-size of = 0:05 was used in the LMS-type algorithm; this pushed the algorithm fairly close to its stability limits, while maintaining an acceptable nal MSE. The normalized LMS-type algorithm was used with an auxiliary step-size ~ = 1:0, which is its default value, corresponding to the optimal step-size at each iteration step. Note that this implies an automatic step-size choice in the NLMS-type algorithm, without a need for step-size design. In our simulations, we also investigated other choices for ~ and their eect on the nal MSE. The nal MSE increased with ~ as expected. Further, it was found that the MSE increased much more rapidly with ~ when ~ > 1:8, compared to its behavior for ~ < 1:8. It is recommended that ~ be chosen nearer to 1:0 in practice, to ensure a reliable performance of the algorithm. 2
16
Weights Weighted Myriad (K = 1:0) wi LMS-type NLMS-type w 0.0249 0.0084 w -0.0945 -0.2118 w -0.1119 -0.2429 w 0.0306 0.0151 w 0.3427 0.9579 w 0.0179 0.0049 w -0.1225 -0.1773 w -0.1011 -0.2284 w 0.0220 0.0099 Table 1: Final weights obtained by the adaptive weighted myriad ltering algorithms. 1 2 3 4 5 6 7 8 9
Table 1 shows the nal weights obtained by the two algorithms (with = 0:05 and ~ = 1:0) using 1000 samples of the noisy observed signal x(n). The nal trained lters, using both the adaptive algorithms, were successful in accurately extracting the high-frequency sinusoidal component. We do not show the lter outputs (using the trained lter weights) here since they are very close to the desired signal. The MSEs achieved by the trained lters, using both the LMS-type and the NLMS-type algorithms, were almost the same (around 0:0022). This allows for a meaningful comparison of the convergence speeds of the two algorithms. Fig. 3 shows the learning curves (MSE as a function of algorithm iterations) for the LMS-type as well as the NLMS-type algorithms. The LMS-type algorithm converges in about 100 iterations to an MSE below 0:02. On the other hand, the NLMS-type algorithm is about ten times faster, converging to the same MSE in just 10 iterations. The gure clearly indicates the dramatic improvement in convergence speed when employing the NLMS-type algorithm. Notice the values of the MSEs in these curves; the NLMS-type algorithm has a lower MSE at each iteration step. This is expected since the NLMS-type algorithm was derived to minimize the next-step MSE at each iteration of the LMS-type algorithm.
Example 2: We apply the normalized adaptive Volterra ltering algorithm of (39) to the problem of adaptive identi cation of a unknown nonlinear system in a noiseless environment. Fig. 4 17
0.8
J(n)
0.6 0.4 0.2 0
10
20
30
40
50 n
60
70
80
90
100
16
18
20
0.25
J(n)
0.2 0.15 0.1 0.05 0
2
4
6
8
10
12
14
n
Figure 3: MSE learning curves for adaptive weighted myriad ltering, LMS-type (top) and NLMS-type (bottom). shows the block diagram representing our simulation experiment. It is assumed that the unknown system can be adequately modeled using a Volterra lter. A known common input signal x(n) is applied both to the unknown nonlinear system and to the adaptive Volterra lter. The desired signal d(n) is the output of the unknown system. The objective of the adaptive algorithms is to minimize the mean squared value of the error e(n) between the desired signal d(n) and the output (estimate) y(n) of the adaptive Volterra lter. In our simulation example, both the unknown system and the adaptive lter were chosen to be second-order (quadratic) Volterra lters with the observation window size N = 5. For an N -long observation vector x, the outputs of these systems are given by d = HT z and y = hT z, where the modi ed observation vector z = [zT j zT ]T is given by (42). Table 2 shows the parameters (weights) Hi of the system to be identi ed, along with the corresponding tap inputs for an observation vector x = [x ; x ; x ; x ; x ]T . The linear and quadratic parts of the system together constitute a parameter vector H of length M = 20. For the input signal x(n), we chose a zero-mean white Gaussian process (i.i.d. samples) with unit variance. The objective of the adaptive algorithms is to train the adaptive Volterra lter of Fig. 4 such 1
1
2
3
18
4
5
2
estimate
Adaptive Volterra Filter
y(n) e(n) error
x(n)
Unknown Nonlinear System
input
d(n) desired
Figure 4: Nonlinear system identi cation using a Volterra lter model. that its parameter vector h converges to the vector H of Table 2. The LMS-type algorithm of (38) and the NLMS-type algorithm of (39) were applied to train the adaptive Volterra lter. Each algorithm was run for 5000 iterations, with the initial lter weights chosen to be all zero: h(0) = 0. This experiment was repeated for 100 independent trials, using a dierent realization of the input signal x(n) for each trial, in order to obtain the average convergence behavior of the algorithms. For the NLMS-type algorithm, the auxiliary step-size used was its default value ~ = 1:0, corresponding to the optimal (automatic) step-size choice at each iteration step. The step-size for the LMStype algorithm, = 1:0 10? , was chosen such that the nal ensemble-averaged mean square error (MSE) was comparable to that of the NLMS-type algorithm. As a secondary experiment, we also determined the stability region of the NLMS-type algorithm for secondorder Volterra ltering. This was done by increasing ~ until the onset of algorithm instability. The step-size bounds turned out to be around 0 < ~ < 2:0, con rming our prediction in Section 4.1. Fig. 5 shows the trajectories of some of the lter weights, averaged over all the 100 independent trials, for both the algorithms. In all cases, the algorithms converged to values that were very close to the weights of the unknown system given in Table 2. As the gure shows, the lter weights converge in about 400-500 iterations using the LMS-type algorithm, while the convergence is much faster (in about 50-60 iterations) with the normalized LMS2
19
0
0.3
−0.2
h1(n)
h4(n)
0.4
0.2
−0.4
0.1
−0.6
0 0
100
200
300 400 iteration, n
500
600
−0.8 0
700
0.4
0.6
0.3
200
300 400 iteration, n
500
600
700
100
200
300 400 iteration, n
500
600
700
h3(n)
h7(n)
0.8
100
0.4
0.2
0.2
0.1
0 0
100
200
300 400 iteration, n
500
600
0 0
700
(i)
(ii)
0
−0.1
h17(n)
0
h10(n)
0.02
−0.02
−0.2
−0.04 100
200
300 400 iteration, n
500
600
−0.4 0
700
0.4
0.6
0.3
0.4
200
300 400 iteration, n
500
600
700
100
200
300 400 iteration, n
500
600
700
0.2
0.2 0 0
100
h20(n)
0.8
13
h (n)
−0.06 0
−0.3
0.1 100
200
300 400 iteration, n
500
600
0 0
700
(iii)
(iv)
Figure 5: Ensemble-averaged tap weight trajectories hi(n) for adaptive Volterra system identi cation using LMS-type (dashed) and NLMS-type (solid) algorithms, (i) h (n) and h (n), (ii) h (n) and h (n), (iii) h (n) and h (n), (iv) h (n) and h (n). 1
3
4
7
10
13
20
17
20
Tap Input Tap Weights Unknown System Vector z Hi x H 0.30 x H -0.50 Linear x H 0.70 x H -0.60 x H 0.20 x H 0.10 xx H 0.25 xx H -0.20 xx H 0.08 xx H -0.03 x H 0.40 xx H -0.30 Quadratic xx H 0.50 xx H 0.06 x H 0.65 xx H -0.35 xx H -0.30 x H 0.45 xx H 0.20 x H 0.15 Table 2: Filter tap inputs and corresponding tap weights of the second-order Volterra nonlinear system to be identi ed, with the input vector x = [x ; x ; x ; x ; x ]T . 1
1
2
2
3
3
4
4
5 2 1
5 6
1
2
7
1
3
8
1
4
9
1
5
10
2 2
11
2
3
12
2
4
13
5
14
2
2 3
3 3
4
2 4 2 5
15
4
16
5
17 18
5
19 20
1
2
3
4
5
type algorithm. This amounts to about a ten-fold increase in speed for most of the weights. Note that the step-size parameters were chosen so that the nal MSEs of the two algorithms are of the same order ( 10? ); this allows for a fair comparison of their convergence speeds from the weight trajectories. The ensemble-averaged MSE learning curves of the two algorithms are shown in Fig. 6. These curves plot the squared error, averaged over all the trials, as a function of algorithm iterations. As these curves indicate, the NLMS-type algorithm converges in about 60 iterations, while the LMS-type algorithm appears to converge in about 300 iterations. In fact, the LMS-type algorithm takes much longer than this to converge; the MSEs of the two algorithms approach the same order of magnitude ( 10? ) only after 5000 iterations. As mentioned previously, a meaningful comparison of the two algorithms is possible from these 31
31
21
6 5
MSE, J(n)
4 3 2 1 0 −1 0
50
100
150 iteration, n
200
250
300
Figure 6: Ensemble-averaged MSE learning curves for adaptive Volterra system identi cation using LMS-type (dashed) and NLMS-type (solid) algorithms. curves since their nal MSEs are of the same order. Thus, the NLMS-type algorithm is seen to converge much faster than the LMS-type algorithm at comparable steady-state MSEs.
6 Conclusion In this paper, we generalized the normalized LMS algorithm (proposed for linear adaptive ltering) by developing a new class of Nonlinear Normalized LMS-type (NLMS-type) adaptive algorithms that are applicable to a wide variety of nonlinear lter structures. These algorithms were derived from the class of nonlinear stochastic gradient (LMS-type) algorithms by choosing an optimal time-varying step-size parameter that minimizes the next-step mean square error (MSE) at each iteration of the adaptive algorithm. By providing for an automatic choice for the step-size, the NLMS-type algorithms eliminate the dicult problem of step-size design that is inherent in all LMS-type algorithms. We illustrated the application of these algorithms through two computer simulation examples: adaptive nonlinear highpass ltering of a sum of sinusoids in impulsive noise using a weighted myriad lter, and adaptive identi cation of a nonlinear system modeled as a second-order Volterra lter. 22
Our experimental results, including ensemble-averaged MSE learning curves and lter weight trajectories, indicate that the NLMS-type algorithms, in general, converge faster than their LMS-type counterparts at comparable steady-state MSEs. These normalized algorithms need to be investigated further by applying them to various nonlinear lters and studying the convergence behavior as well the steady-state performance in each case.
A Derivation of Fj0(0) In this appendix, we derive the expression for Fj0(0) given in (29), that is, we show that
F 0(0) =4 j
@ F () @ j
=0
=?
From (20), we have
N X k=1
y (n) + @y (n) @y (n) : (49) Fk (0) E e(n) @w@ @w @wk @wj k j (
)
2
@y (n + 1) ; j = 1; 2; : : : ; N: Fj () = ? E e(n + 1) @w (50) j Using e(n + 1) = y(n + 1) ? d(n + 1) in (50), we can write @ F () Fj0() = @ j @ @y (n + 1) ? E @y(n + 1) @y (n + 1) = ? E e(n + 1) @ @w @ @w (
)
"
(
#)
(
)
j
4
= A() + B ();
j
(51)
where we have introduced the two terms A() and B () for convenience. Note that y(n+1) y(w(n +1); x(n +1)), and w(n +1) is in turn a function of , while the desired signal d(n +1) is independent of . We can use the chain rule of dierentiation to expand the rst term in (51) as follows:
@ @y (n + 1) A() = ? E e(n + 1) @ @wj N @y (n + 1) @wk (n + 1) @ = ? E e(n + 1) @wk (n + 1) @wj @ k N y (n + 1) (n) = ? E e(n + 1) @w@ @w k k j k (
"
(
X
#)
"
X
=1
(
#
)
)
2
=1
= ?
N X
k=1
E fG (w(n + 1); x(n + 1); d(n + 1))g Fk (0); 1
23
(52)
where we have used (16) and (21), and also introduced a new function G (w(n + 1); x(n + 1); d(n + 1)) for convenience. Similarly, we can expand the second term in (51) as follows: 1
@y (n + 1) B () = ? E @y(n@+ 1) @w j N @y (n + 1) @wk (n + 1) @y (n + 1) = ?E @wk (n + 1) @ @wj k N @y (n + 1) @y (n + 1) (n) = ? E @w k @wj k k (
)
("
X
#
X
=1 (
)
)
=1
= ?
N X
k=1
E fG (w(n + 1); x(n + 1); d(n + 1))g Fk (0); 2
(53)
where we have introduced another new function G (w(n + 1); x(n + 1); d(n + 1)) for convenience. Referring to (51), we see that the quantity desired is 2
Fj0(0) = A(0) + B (0):
(54)
Recall from (19) that = 0 corresponds to w(n +1) = w(n). Using this fact in the de nition of A() given in (52), we can write
A(0) = ? = ?
N X k=1 N X k=1
E fG (w(n + 1); x(n + 1); d(n + 1))gj 1
=0
E fG (w(n); x(n + 1); d(n + 1))g Fk (0): 1
Fk (0) (55)
Now, noting that w(n) is a deterministic quantity in the steepest descent algorithm of (9), and assuming that the processes x(n + 1) and d(n + 1) are strictly stationary, we conclude that E fG (w(n); x(n + 1); d(n + 1))g = E fG (w(n); x(n); d(n))g : (56) 1
1
Notice that the right hand side in the above equation involves quantities at time n only. Using (56) in (55) and the de nition of G (; ; ) given in (52), we nally obtain 1
A(0) = ? = ?
N X k=1 N X k=1
E fG (w(n); x(n); d(n))g Fk (0) 1
y (n) : Fk (0) E e(n) @w@ @w k j (
24
2
)
(57)
Following a similar procedure to derive the quantity B (0) from (53), we obtain
B (0) = ? = ? = ?
N X k=1 N X k=1 N X k=1
E fG (w(n + 1); x(n + 1); d(n + 1))gj 2
=0
Fk (0)
E fG (w(n); x(n); d(n))g Fk (0) 2
@y (n) @y (n) : Fk (0) E @w @wj k )
(
(58)
Using (57) and (58) in (54), we nally have the desired expression for Fj0(0):
F 0(0) j
=?
N X k=1
y (n) + @y (n) @y (n) : Fk (0) E e(n) @w@ @w @wk @wj k j (
2
)
(59)
References [1] S. Haykin, Adaptive Filter Theory. Englewood Clis, NJ: Prentice Hall, 1996. [2] M. Rupp, \The behavior of LMS and NLMS algorithms in the presence of spherically invariant processes," IEEE Transactions on Signal Processing, vol. 41, pp. 1149{1160, Mar. 1993. [3] S. Kalluri and G. R. Arce, \A general class of nonlinear normalized LMS{type adaptive algorithms," in Proc. of the 1998 CISS, (Princeton, NJ), Mar. 1998. [4] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Systems. New York: Wiley, 1980. [5] S. Benedetto, E. Biglieri, and V. Castellani, Digital Transmission Theory. Englewood Clis, NJ: Prentice Hall, 1987. [6] T. Koh and E. J. Powers, \Second-order Volterra ltering and its application to nonlinear system identi cation," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-33, pp. 1445{1455, Dec. 1985. [7] I. Pitas and A. N. Venetsanopoulos, Non-linear Digital Filters - Principles and Applications. Boston, MA: Kluwer, 1990. 25
[8] I. Pitas and A. Venetsanopoulos, \Order statistics in digital image processing," Proceedings of the IEEE, vol. 80, Dec. 1992. [9] V. J. Mathews, \Adaptive polynomial lters," Signal Processing Magazine, vol. 8, pp. 10{26, July 1991. [10] I. Pitas and A. Venetsanopoulos, \Adaptive lters based on order statistics," IEEE Transactions on Signal Processing, vol. 39, Feb. 1991. [11] D. W. Grith and G. R. Arce, \Partially decoupled Volterra lters: Formulation and LMS adaptation," IEEE Transactions on Signal Processing, vol. 45, June 1997. [12] F. Palmieri and C. G. Boncelet, Jr., \Ll lters { A new class of order statistic lters," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, May 1989. [13] P. Ghandi and S. A. Kassam, \Design and performance of combination lters," IEEE Transactions on Signal Processing, vol. 39, July 1991. [14] Y.-T. Kim and G. R. Arce, \Permutation lter lattices: A general order-statistic ltering framework," IEEE Transactions on Signal Processing, vol. 42, Sept. 1994. [15] J. G. Gonzalez and G. R. Arce, \Weighted myriad lters: A robust ltering framework derived from -stable distributions," in Proc. of the 1996 IEEE ICASSP, (Atlanta, GA), 1996. [16] J. G. Gonzalez and G. R. Arce, \Weighted myriad lters: A powerful framework for ecient ltering in impulsive environments," IEEE Transactions on Signal Processing. Manuscript in review. [17] S. Kalluri and G. R. Arce, \Adaptive weighted myriad lter algorithms for robust signal processing in -stable noise environments," IEEE Transactions on Signal Processing, vol. 46, pp. 322{334, Feb. 1998. 26
[18] S. Kalluri and G. R. Arce, \Robust frequency{selective ltering using weighted myriad lters admitting real{valued weights," IEEE Transactions on Signal Processing. Submitted for publication. [19] C. L. Nikias and M. Shao, Signal Processing with Alpha-Stable Distributions and Applications. New York: Wiley, 1995.
27