TRANSIENT ANALYSIS OF ADAPTIVE FILTERS Tareq Y. Al-Naffouri1 and Ali H. Sayed2 1 Electrical
Engineering Department Stanford University, CA 94305
2 Electrical
Engineering Department University of California, Los Angeles, CA 90095
ABSTRACT
This paper develops a framework for the mean-square analysis of adaptive lters with general data and error nonlinearities. The approach relies on energy conservation arguments and is carried out without restrictions on the probability distribution of the input sequence. In particular, for adaptive lters with diagonal matrix nonlinearities, we provide closed form expressions for the steady-state performance and necessary and sucient conditions for stability. We carry out a similar study for long adaptive lters that employ error nonlinearities relying on a weaker form of the independence assumption. We provide expressions for the steady-state error and bounds on the step-size for stability by exploiting the Cramer-Rao bound of the underlying estimation process.
1. ADAPTIVE FILTERING MODEL Consider noisy measurements d(i) = u w + v(i), where w denotes an unknown column vector that we wish to estimate, u is a row regression vector, and v(i) is i
o
o
i
measurement noise. Adaptive schemes for estimating wo rely on recursive updates of the general form wi+1 = wi + H (ui )uTi f (e(i)); i 0
(1) where wi is the estimate of wo at time i; is the stepsize, and e(i) = d(i) ? ui wi (2) is the estimation error. The correction term in (1) is usually expressed in a separable form, H (ui )uTi f (e(i)), Published in Proc. ICASSP, Salt Lake City, Utah, May 2001. This work was partially supported by the National Science Foundation under awards ECS-9820765 and CCR-9732376. The work of T. Y. Al-Naouri was also partially supported by a fellowship from King Fahd University of Petroleum and Minerals, Saudi Arabia.
Table 1: Examples for f (e(i)) and H (ui ) Algorithm f [e(i)] LMS e(i) LMF e3 (i) LMF family e2k+1 (i) 3 ) LMMN ae(i) + be 2(i R e(i) Sat. nonlin. 0 exp ? 2zz2 dz Sign error sign[e(i)] H (ui ) 1 NLMS ku1i k2 I -NLMS 2I +kui k sign regressor diag signui(1ui1 ) ; : : : ; signui(MuiM ) variable steps diag(1 ; 2 ; : : : ; M ) Algorithm
where f (e(i)) denotes a scalar error nonlinearity and H (ui ) denotes a data nonlinearity and is taken as a diagonal matrix with nonnegative entries. In this paper, we focus on correction terms that are nonlinear in the data or in the error but not both. This class of algorithms is general enough to include the special cases listed in Table 1. Several of these algorithms were already considered in the literature (see, e.g., [1]{[3] and [6] and the many references therein). The purpose of this article is to provide a framework for performing mean-square analysis of the general class of algorithms (1){(2) in a uni ed manner. This is achieved by relying on the energy-conservation approach developed in [4]{ [6] and by expanding it to handle both transient analysis and mean-square analysis.
2. ENERGY RELATION
Mean-square analysis of (1)-(2) is best carried out in terms of the normalized regressor ui = ui H (ui ) and
the following error quantities: wo ? w w~ i = weight-error vector i ~i ea (i) = ui w weighted a priori error u w~ e ( i ) = weighted a posteriori error i i+1 p where denotes a weighting matrix. We reserve special notation for the case = I : ea (i) = eIa(i) and ep (i) = eIp (i). Using these error quantities, we can rewrite the adaptive algorithm (1)-(2) as
w~ i+1 = w~ i ? uTi f (e(i)) e(i) = ea (i) + v (i)
(3) (4) We also nd it useful to use the compact notation kw~ i k2 = w~ Ti w~ i . This notation is convenient because it enables us to transform operations on w~ i into operations on the norm subscript, as demonstrated by the following properties. Let a1 and a2 be scalars and 1 and 2 be symmetric matrices of size M . Then
1) Superposition.
2) Polarization. (ui 1 w~ i ) (ui 2 w~i ) = kw~ i k21 uTi ui 2 E
h
i
h
kw~ i k21 uTi ui 2 = E kw~ i k21 E[uTi ui ]2
i
4) Notational convention. Using the vector notation, we kw~ k2 shall write kw~ i k2vec(1 ) = i 1
With the above de nitions and notation at hand, we proceed to premultiply both sides of (3) by ui H (ui ) to get ui H (ui )w~ i+1 = ui H (ui )w~ i ? f (e(i))ui H (ui )uTi Incorporating the expressions for ui ; e(a) ; and e(p) , and
solving for f (e(i)); we nd that f (e(i)) =
(i) eH a
ep (i) ? 2 kui k kui k2 H
(5)
Combining (3) and (5) to eliminate f (e(i)), and taking the -weight of the resulting expression leads to the energy conservation relation: H kw~ i+1 k2 + ekaui k(2i) = kw~ i k2 + kui k(2i) eH p
3. THE DATA NONLINEARITY CASE In this section, we assume f (e(i)) = e(i) and proceed to study the mean-square performance of the resulting algorithm. For this purpose, we rely on the following independence assumptions: AN The noise v(i) is i.i.d. and independent of the input.
AI The sequence of regressors fu g is independent with zero mean and autocorrelation matrix R: i
~ i k21 + a2 kw~ i k22 = kw~ i k2a1 1 +a2 2 a1 kw
3) Independence. If w~ i and ui are independent,
This equality relates the weighted energies of the er ror variables fw~ i ; w~ i+1 ; e a (i); ep (i)g; it is the weighted version of the energy relation derived in [4]{[6] and used there, and in other related references, to study the performance of adaptive lters from both deterministic and stochastic points of view. The inclusion of the weighting factor allows us to perform both transient and steady-state analyses. Observe that no assumptions or approximations were used to derive (6). This relation will be the starting point for much of the subsequent discussion.
(6)
Thus note rst that (5) becomes (i) = eHa (i) ? e(i)kui k2 eH p Substituting this expression for eHp (i) into the energy relation (6), we get kw~ i+1 k2 = kw~ i k2 ? 2eHa (i)e(i) + 2 kui k2 e2 (i) (7) By further incorporating (4) and assumption AN, (7) reads under expectation E
h
i
kw~ i+1 k2 = E kw~ i k2 ? 2E eHa (i)ea(i) +2 E e2a(i)kui k2 + 2 v2 E kui k2
(8)
Using the weighted-norm properties, we can rewrite the estimation error expectations in (8) as some weighted norms of w~ i : 2ea (i)eHa (i) = 2w~ Ti uTi ui w~ i = kw~ i k2uTi ui +uTi ui (9) ~ Ti uTi kui k2 ui w~ i = kw~ i k2uTi kui k2 ui e2a kui k2 = w
(10)
Substituting (9){(10) into (8) and using assumption AI yields "
~ i+1 k2 = E kw~ i k2 + 2 E kw~ i k2EuT ku k2 E kw
i
i
ui
#
i
h
?E kw~ i k2E[uiui ]+E[ui ui ] + 2 v2 E kui k2
or, more compactly, E
h
i
h
i
h
kw~ i+1 k2i+1 = E kw~ i k2i + 2 v2 E kui k2i+1
i
(11)
where a time index (i + 1) has been attached to ; and where fi ; i+1 g are related via h i h i i = i+1 ? i+1 E uTi ui ? E uTi ui i+1 +2 E [kui k2i+1 uTi ui ] (12) Relations (11){(12) (or, equivalently, (14){(15) below) are the equivalent representations of the energy relation (6) under assumptions AN and AI. They can be used to derive conditions for mean-square stability, as well as expressions for the steady-state mean-square error and mean-square deviation of an adaptive lter. To see this, we start by noting that the recursion for i can be rewritten more compactly, using the vec operation and the Kronecker product notation, as i = F i+1 (13) where
?
?
F = E I ? uTi ui I ? uTi ui
(14)
and i = vec (i ) : In light of (13), relation (11) becomes E
h
i
h
i
kw~ i+1 k2i+1 = E kw~ i k2Fi+1 + 2 v2 E kui k2i+1
(15)
By inspecting (15), it becomes clear that the recursion is stable if, and only if, the matrix F is stable. Thus let A = I E [ui ui ] + E [ui ui ] I and B = E [ui ui ui ui ]. Then, from (14), F = I ?A+2 B and F will be stable if, and only if, 0 < < max(A1 ?1 B) which provides the desired condition for mean-square stability. Now assuming the lter is stable, we have lim E i!1
h
i h 2i ~ kw~ i+1 k2 = ilim E kw i k !1
Thus, in the limit, and using the change of variables 0 = (I ? F ), relation (15) takes the form h
i
h
limi!1 E kw~ i k20 = 2 v2 E kui k2(I ?F )?10
i
(16)
This expression allows us to evaluate the steady-state weight-error energy for any weight 0 : In particular, we can get the mean-square error by choosing 0 = vec (R) ; and the mean-square deviation by choosing 0 = vec (I ) ; i.e., lim E e2a (i) = i!1
i
h
kui k2(I ?F )?1 vec(R) h i 2 i = 2 2 E hkui k2 ?1 ~ lim E kw ik v ( I ?F ) vec(I ) i!1 2 v2 E
4. THE ERROR NONLINEARITY CASE In this case, H (u ) = I . However, the analysis is more i
demanding and we shall assume that the lter is long enough for the following assumptions to be reasonable: AG ea(i) is Gaussian. AU kuik2 and f 2(e(i)) are uncorrelated. For long adaptive lters, the rst assumption is justi ed by central-limit theorem arguments while the latter is a weaker version of the independence assumption (it becomes more accurate as the lter gets longer). Thus consider relations (5) and (6) for = H (ui ) = I : By eliminating ep (i) from both equations, we get a recursion similar to (7) for the nonlinear error case:
kw~ i+1 k2 = kw~ i k2 ? 2f (e(i))ea(i) + 2 kui k2 f 2 (e(i)) Upon taking the expectations of both sides, E
h
i
h
i
kw~ i+1 k2 = E kw~ i k2 ? 2E [f (e(i))ea (i)] h i +2E kui k2 f 2 (e(i)) (17)
we see that two expectations call for evaluation. Since is Gaussian, we have by Price theorem,
ea (i)
E [f (e(i))ea (i)]
= =
E e2a (i) E [f 0 (e(i))] ?
E e2a (i) h E e2a (i)
(18)
for some function h(). By assumption AU, we can also write E
h
i
kui k2 f 2 (e(i)) =
E
h
kui k2
= Tr (R)
i
?
E f 2 (e(i))
q E e2a (i)
(19)
for some function q(). Notice that in (18) and (19), E [f 0 (e(i))] and E [f2 (e(i))] depend on ea (i) through the second moment E e2a (i) only, since ea (i) is Gaussian and independent of the noise. Table 2 lists the expressions for the functions h() and q() for the error nonlinearities of Table 1.
Table 2: h() and q() for the error nonlinearities of E [e2 (i)]) Table 1 and for Gaussian noise (e2 = a h(e2 )
1 3(e2 + v2 ) (2k+2)! 2 2 k 2k+1 (k+1)! (e + v ) a + 3be2 v2 + 3be2
p 2 z 2 2 q z +v +e 2p 1 2 + 2 e
q (e2 ) e2 + v2 15(e2 + v2 )3 (4 k+2)! 2 2 2k+1 22k+1 (2k+1)! (e + v ) 2 2 2 a (e + v ) + 6ab(e2 + v2 )2 +15b2(e2 + v2 )3 z2 sin?1
+ + +
2 e2 v 2 z2 e2 v
1
v
To determine the steady-state performance of the ~ algorithms, we note that in steady-state, E k w k2 = i +1 ~ i k2 as i ! 1. Let S = limi!1 E e2a (i) . Then E kw (17) leads to S
= 2 Tr (R) hq((SS))
(20)
This expression shows that the mean-square error, S , is a xed point of the function 2 Tr (R) hq((SS)) : For a given error nonlinearity, we can therefore determine S by rst determining h and q and then solving for S . To study stability, we consider recursion (17) again and note that if is chosen to satisfy for all i: 2 E
h
i
kui k2 f 2 (e(i)) 2E [f (e(i))ea(i)]
then E kw~ i+1 k2 E kw~ i k2 ; i.e., the mean-square deviation will be a decreasing and hence convergent sequence. Now by Cauchy-Schwartz inequality, we have
E
h
i
kui k2 f 2 (e(i))
E
=
E
h
i1=2
h
i1=2 ?
kui k4
kui k4
E f 4 (e(i))
p E [e2a (i)]
1=2
for some function p(). Hence, a more conservative condition on for stability is ? 2E [e2a (i)]h E [e2a (i)] min (21) i1=2 h E [e2 a (i)] E kui k4 p (E [e2a (i)]) Minimizing (21) over E [e2a (i)] can be demanding. Instead, we know that E [e2a (i)] is lower-bounded by the Cramer-Rao bound of the underlying estimation process. To obtain an upper bound, we note that if is chosen to satisfy (21), then E
h
i
kw~ i k2 E kwi?1 k2 E kw0 k2
Therefore, since ea (i) is Gaussian, we have 1 1 ~ i j]2 E ea (i)2 = [E jea (i)j]2 = E [jui w 4 4 14 E kui k2 1=2 E kw~ i k2 1=2 14 [Tr (R)]1=2 E kw0 k2 1=2 This prompts us to de ne the feasibility set
= : 41 [Tr (R)]1=2 E kw0 k2 1=2
By carrying out the minimization in (21) over the set
, we get the following condition for stability min2 2 4h(1=)2 (22) E kui k p() By reviewing the above stability argument, we see that only the Gaussian assumption AG was used. Explicit bounds on can be obtained by evaluating h and p and carrying out the minimization in (22).
5. CONCLUSION In this paper, we presented a uni ed approach for the transient analysis of adaptive lters. Among other results, we provided conditions for stability and expressions for the steady-state error.
6. REFERENCES [1] D. L. Duttweiler, \Adaptive lter performance with nonlinearities in the correlation multiplier," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 30, no. 4, pp. 578-586, Aug. 1982. [2] E. Walach and B. Widrow, \The least mean-fourth ( LMF) adaptive algorithm and its family," IEEE Transactions on Information Theory, vol. 30, no. 2, pp. 275-283, Aug. 1984. [3] N. J. Bershad. \Analysis of the normalized LMS algorithm with Gaussian inputs," IEEE Trans. Acoust. Speech Signal Process., vol. 34, pp. 793{806, 1986. [4] A. H. Sayed and M. Rupp, \A time-domain feedback analysis of adaptive algorithms via the small gain theorem," Proc. SPIE, vol. 2563, pp. 458{69, San Diego, CA, Jul. 1995. [5] M. Rupp and A. H. Sayed, \A time-domain feedback analysis of ltered-error adaptive gradient algorithms," IEEE Transactions on Signal Processing, vol. 44, no. 6, pp. 1428{1439, Jun. 1996. [6] N. R. Yousef and A. H. Sayed, \A uni ed approach to the steady-state and tracking analyses of adaptive lters," to appear in IEEE Transactions on Signal Processing, vol. 49, no. 2, February 2001. [See also Proc. 4th IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 2, pp. 699{703, Antalya, Turkey, June 1999.]