Analysis of mean-square error and transient speed ... - Semantic Scholar

Report 3 Downloads 126 Views
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

1873

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm Onkar Dabeer, Student Member, IEEE, and Elias Masry, Fellow, IEEE

Abstract—For the least mean square (LMS) algorithm, we analyze the correlation matrix of the filter coefficient estimation error and the signal estimation error in the transient phase as well as in steady state. We establish the convergence of the second-order statistics as the number of iterations increases, and we derive the exact asymptotic expressions for the mean square errors. In particular, the result for the excess signal estimation error gives conditions under which the LMS algorithm outperforms the Wiener filter with the same number of taps. We also analyze a new measure of transient speed. We do not assume a linear regression model: the desired signal and the data process are allowed to be nonlinearly related. The data is assumed to be an instantaneous transformation of a stationary Markov process satisfying certain ergodic conditions. Index Terms—Asymptotic error, least mean square (LMS) adaptive algorithm, products of random matrices, transient analysis.

I. INTRODUCTION

C

ONSIDER jointly stationary random processes and , taking values in and , respectively. The is a solution of the vector which minimizes is invertible, it is Wiener–Hopf equation, and if , where . Note that all vectors given by are column vectors, and denotes the transpose of . We can write (1) where the estimation error

is orthogonal to the data, that is, (2)

In practice, the statistics of the data is seldom known, and has to be estimated based on a single realization of . For example, such a problem arises in system identification and channel equalization [1, Introduction]. A common approach is to use stochastic adaptive algorithms, which recursively update an estimate of as more data becomes available. In this paper, we consider the constant step-size least mean square (LMS) algorithm, which updates an estimate of using the recursion (3) Manuscript received May 1, 2000; revised July 1, 2001. This work was supported by the Center for Wireless Communications, University of California, San Diego. The material in this paper was presented in part at CISS-2000, Princeton, NJ, March 2000. The authors are with the Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]; [email protected]). Communicated by U. Madhow, Associate Editor for Detection and Estimation. Publisher Item Identifier S 0018-9448(02)05156-8.

Here is the fixed step size and is a deterministic initialization. The LMS algorithm, and its variants, have been used in a variety of applications, and many researchers have analyzed them (see, for example, [2]–[22]). To give a flavor of the analytical results known so far, we briefly mention some previous results for dependent data. In the literature, two commonly used performance criteria are the estimation error in the filter coefficients , and the signal (also called, deviation error) . In [4], convergence in distriestimation error ) of is established for bounded uniformly bution (as mixing data. In [6], for -dependent data, it is shown that as tends to infinity, is bounded by a multiple of . For , the time avbounded, purely nondeterministic regressors is analyzed in [10] as . In [12], [23], erage and [24], for a general class of adaptive algorithms, asymptotic is established by letting and normality of in such a way that remains bounded. In [13], [21], and [22], , can be for specific examples it is shown that as , that is, in some cases the LMS algorithm smaller than outperforms . Many authors have also analyzed the speed of convergence, and the most recent work is that in [15] and [20]. In Section III, we compare our results with some of the above mentioned results; however, we next compare our contribution to the more recent works in this area. Our results are closest in spirit to recent results in [14], [17, Theorem 5], and [20]. In [14] and [17], a simple approximation when is time-varying. Even is given for is time-invariant, which is the case in this paper, the when results in [14] and [17] are the most general results known for error analysis in the transient phase. However, [14] and [17] impose restrictive conditions on the relationship between and . In [14], it is assumed that and

are independent

(4)

and is zero mean, white.

(5)

Condition (4) implies that (1) is a linear regression model. In and may be nonlinearly related, and the general case, and may be dependent. Also, the sethough (2) holds, may be correlated, and (5) may not hold. In quence [17, Theorem 5], it is assumed that (6) are If the data is zero mean Gaussian, then (6) implies that is indeindependent and identically distributed (i.i.d.) and

0018-9448/02$17.00 © 2002 IEEE

1874

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

pendent of . The conditions (4)–(6) are very restrictive, and they are not satisfied in applications such as channel equalization. In this paper, we analyze the LMS algorithm without putting any strong restriction on the relationship between and . We not only extend the results in [14] and [17], but we also prove additional new results. Specifically, we provide comprehensive quadratic-mean analysis of the LMS algorithm for dependent data, under conditions considerably weaker than those in the literature. by a matrix , which is spec1) We approximate ified by a simple recursion depending on the statistics of . In the data. The approximation error vanishes as the general case of dependent data and nonlinear regresitself does not have a simple recursion. sion model, This result extends the result in [14] and [17] by removing (4)–(6). 2) For small , we prove the convergence of the algorithm such that for in the following sense: there exists , exists. To the best of our knowledge, for dependent data, convergence of second-order statistics of the LMS algorithm has not been established before.

the operator-theoretic framework for Markov processes [25, Ch. 16]. Lemma 11 establishes refinements of some results in [14], which are critical to prove Theorems 1–4 without assuming (4) and (5). Since we do not assume (4) and (5), the analysis in this paper is substantially different from that in [14]. The paper is organized as follows. In Section II, we present our assumptions and provide examples. In Section III, we state and discuss the main results, and we also compare our results with previously known results. In Section IV, we prove the main results using a series of lemmas which are proved in Section V. In Section VI, we present the conclusion. Lemma 11 and a few preliminary results are proved in Appendix I. In Appendix II, we prove related refinements of some results in [14]. II. ASSUMPTIONS AND EXAMPLES In this section, we first state our assumptions, and then we give detailed explanations and examples. (

)

(

)

exists, 3) We show that the limit and it satisfies the Lyapunov equation given in [23]. , 4) We study the excess signal estimation error . We approximate by a where simple expression, which can be computed using a simple recursion. The approximation error is small when is small. This result generalizes a result in [14], where (4) and (5) are assumed. 5) We show that for sufficiently small step size , the limit exists, and we derive the limit in terms of the statistics of the data. This expression is new. In particular, our result shows that under certain conditions, the LMS algorithm gives a smaller signal estimation error than . No previous results explain this phenomenon. 6) We analyze a new measure of transient speed

where

.

to be an instantaneous vector transWe assume formation of a stationary Markov process satisfying certain ergodic conditions. Our assumptions are satisfied for many examples of practical importance, and they allow applications such as channel equalization, where (4)–(6) are not true (see Section II). itself in not Markovian, and it need We note that not even be uniformly mixing (see Section II). Most papers dealing with the convergence of the LMS algorithm establish some form of exponential convergence of products of random matrices. In Lemma 11 of Appendix I, we obtain exponential convergence for products of random matrices using

is an aperiodic, -irreducible (see [25]), stationary Markov process with state space , and there such that exists a measurable function ; a) with b) for any measurable function , there exists and such that

a) and able.

, where , where

is measurable, is measur, .

b) where c) d) e)

is a constant,

, and

and

.

, . is positive definite.

: Using the terminology in [25], asDiscussion of states that is a -uniformly ergodic sumption is -uniformly Markov process. This also implies that instead of ergodic. (We have stated this assumption for in order to avoid carrying a square root sign throughout b) implies that for a class of the paper.) Assumption functions , the conditional expectation converges exponentially fast (as ) to the unconditional . Though at first sight this assumption expectation looks difficult to verify, [25, Theorem 16.0.1 and Lemma 15.2.8] give a simple criteria to verify it: find a function such that , where and . It turns out that in most applications, can be , or an exponential function chosen to be a polynomial in . For example, consider , of a power of are i.i.d. In [14, Sec. 3.1] it is shown that if where for some , the eigenvalues of are is controllable, and has strictly inside the unit circle, is satisfied with the an everywhere positive density, then

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

1875

choice . Similarly, if for some , then such that is satisfied with . the choice a): This assumption states that the data Discussion of is an instantaneous transformation of a Markov process. Note that the data itself need not be Markovian. This assumption has been used before in [14] and [23]. We give examples later to show how this assumption arises in practice. From [26, Theis absolutely regular. Absoorem 4.3], we know that lute regularity is a popular form of mixing, which is weaker than uniform mixing ( -mixing) and stronger than strong mixing ( -mixing). The assumption of uniformly mixing data has been used before (see, for example, [4] and [16]). We note that the is in general not uniformly mixing: from Markov process is uniformly [26, Sec. 4 and Theorem 6.0.2] we know that for all . mixing if and only if b): For , b) is the same as [14, Discussion of ]. The extra assumption we make for is not at Sec. 4, all restrictive, and it is dealt with exactly as in [14]. This assumpwith respect to tion puts a restriction on the growth of . If is a polynomial in , the growth of the drift then without any further assumptions, this condition requires to be bounded. However, for an exponential drift the data , the components of are permitted to be any polynomial, and even a function with suitably slow exponential growth is allowed. We note that this assumption allows data for which all the moments are not finite (see the examples below). c)–e): c) is satisfied if Discussion of and are bounded by a multiple of . This is a very mild condition, especially for an exponential drift . Under this restriction on the growth of function and , assumption d) is satisfied provided . Such growth conditions arise because we wish to apply the convergence of the conditional expectation b)) for these functions. The positive definiteness of (see is commonly used to guarantee the existence of a unique .

and , respectively. The channel noise is independent of the information sources. The aim is based on in to estimate the presence of interference due to user 2, and in the presence channel noise. The information sources are usually modeled as irreducible, aperiodic Markov chains with a finite state space, and the noise is modeled as zero mean, i.i.d. Assuming , it is easy to see that our assumptions are satisfied by choosing

be a Gaussian vector ARMA( ) Example 1: Let process. Then by stacking together the vectors , , we can define . As in [27, . Example 12.1.5, p. 468], we can write stated in the discussion of Under the assumptions on above, an exponential drift can be chosen. Hence linear are allowed, and in particular functions of satisfies our assumptions. Similarly, we can also take to be any polynomial transformation, or even a suitable exponential transformation of . In the latter case, all the moments and are in of the data may not be finite. Also note that general nonlinearly related.

III. MAIN RESULTS

Example 2: For the case of channel equalization, the conditions (4)–(6) are not satisfied. We now show that our assumptions are satisfied for this application. Let the channel output be

Here and are independent information sources of two users experiencing channels with impulse response

and . This example can be clearly extended to any finite number of users, and using the method in Example 1, an additional narrowband interference satisfying a scalar aumodel can also be toregressive moving average (ARMA) included. More examples can be found in [14, Sec. 3] and [25]. Since some of our results are similar in nature to those in [14] and [17, Theorem 5], we state below the main differences in our assumptions. • We do not assume (4) and (5), which are assumed in [14]. As indicated in the discussion of our assumptions above, the extra assumptions we make over [14] are mild. • We do not assume (6), which is assumed in [17, Theorem 5]. Also, the moment restrictions in [17] are more stringent. For example, for i.i.d. data, condition (15) in [17] requires that the density of the data should decay like a Gaussian or faster than a Gaussian. On the other hand, our assumptions allow data which can be an exponential transformation of Gaussian data (see Example 1). In parare i.i.d. and , are ticular, if finite, then it is easy to verify that our assumptions are satand . isfied with

Notation: Let . For a square matrix , denotes the trace, and denotes the matrix norm induced by the Euclidean norm. By the phrase “sufficiently small ,” such that for .” In many of the we mean “ denotes a term bounded by a multiple of expressions, , and the bound is uniform in other parameters that may be matrix , and involved in the expressions. For an matrix , denotes the Kronecker product which a matrix whose th block is the is defined to be the matrix , , . For an matrix , denotes the -dimensional column vector obtained by stacking the columns of into a single vector. A. Analysis of the Mean of the Deviation Error Under assumptions (4) and (5), it is shown in [28, Ch. 6] converges exponentially fast to and [14, Theorem 3] that , that is, the filter coefficient estimate based on zero as the LMS algorithm is asymptotically unbiased. However, this is not always the case when (4) and (5) are violated. Consider the following example.

1876

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

Example 3: Let be i.i.d. Gaussian random variables with zero mean and unit variance. Let the desired signal . Then by simple calculations using (3), it follows that

exists, and (9) (10) where the infinite series converges absolutely.

Hence for , , that is, the filter coefficient estimate based on the LMS algorithm is not asymptotically unbiased.

The proof is given in Section V-H. Corollary 2: Suppose assumptions fied. Then for sufficiently small

and

are satis-

In the general case, we have the following result. and

Theorem 1: Suppose assumptions fied. Then for sufficiently small

are satis-

(7) is uniform in . Also exists for where is independent of sufficiently small . Further, if and , then can be replaced by where . The proof of this result is much simpler than the proof of the other results in this paper, and the method of proof is also similar to the proof of the other results. Hence, we do not present the proof of Theorem 1 in this paper. The interested reader is referred to the Ph.D. dissertation [29]. B. Analysis of the Correlation Matrix of the Deviation Error Theorem 2: Suppose assumptions fied. Then we have the following results.

and

are satis-

1) For sufficiently small

where

,

and

are uniform in , and for

(8) In the above recursion

where the infinite series converges absolutely. 2) For sufficiently small , ists, Lyapunov equation

exexists, and it satisfies the .

The proof is given in Section IV-A. Before discussing the above result, we state two corollaries. Corollary 1: Suppose assumptions fied. Then for sufficiently small step-size

and

are satis-

where

is uniform in .

The proof is given in Section V-I. is large comIn the initial phase of the algorithm, pared to , while the approximation error in Part 1 of Theorem 2 is of the order of . When is large, the approximation error is is of the order of . Hence, of the order of , while is a good apPart 1 of Theorem 2 implies that for small , in the transient phase as well as in steady proximation to is easy to analyze, while state. Note that the recursion for for the general case of dependent data and nonlinear regression itself does not have a simple expression. If we model, . The correassume (6) (or (4) and (5)), then sponding simplified form of Part 1 of Theorem 2 is similar to the result obtained by applying [14, Theorem 4] (also see [17, Theorem 5]) to the special case of the LMS algorithm being used to estimate a time-invariant parameter . However, we have a better rate for the error term than in [14] and [17], where the . Thus, we not only remove error term is assumptions (4)–(6), but we also obtain a better rate for the approximation error. The literature on the performance analysis of the LMS algorithm, and its variants, can be divided into two categories: that which studies quantities like the asymptotic error by assuming the convergence of the algorithm, and that which first establishes the convergence of the algorithm. Our work is in the spirit of the second category. In Part 2 of Theorem 2, we first es, tablish the existence of the limit and then we study its behavior for small . Note that the conas does not follow from Part 1 of vergence of Theorem 2. To the best of our knowledge, for dependent data, convergence of second-order statistics of the LMS algorithm has not been shown before. on , In order to explicitly indicate the dependence of . In [23, Theorem 15, p. 335], under the we denote it by assumption that the data is an instantaneous function of a geometrically ergodic Markov process, an asymptotic normality result for general adaptive algorithms is established. (Similar results have also been proven in [24] and [12].) For the case of the LMS algorithm, [23, Theorem 15, p. 335] implies that if we let and such that remains bounded, then converges in distribution to a normal distribution with zero mean . This and covariance matrix satisfying weak convergence result rules out the case of practical interest: fixed and large. Also, this asymptotic distribution result

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

does not imply convergence of the second moment. In contrast, . in Theorem 2 we study the second-order statistics of • In Part 1, we present a result about the behavior of during the transient phase. fixed, we prove the convergence • In Part 2, for ) of , and then we study for (as small . Also B(ii), B(iii), and B(iv) of [23, p. 335] are assumptions on , and these appear to be difficult to verify. the sequence On the other hand, all our assumptions are on the data process . [30, eq. (12), p. 20] Using and the positive definiteness of , it is easy to show that the is specified by unique solution to If

is the eigenvector of corresponding to eigenvalue , then can also be written as

In [5], under the assumption that the process -dependent, it is shown that for sufficiently small

,

is

constant In comparison, Corollary 1 is a stronger result: we show that is well defined for sufficiently small the limiting value , we establish the precise rate at which the asymptotic error converges to as , and we obtain the asymptotic constant in terms of the statistics of the data. Furthermore, we remove the assumption of -dependent data. For small , , where is the constant on the right-hand side of (10). Corollary 2 is a consequence of Theorems 1 and 2. As per this can be approximated by the deterministic result, for small , . This approximation is meaningful during vector is large comthe initial phase of the algorithm when pared to . A similar result was suggested in [20] without proof. C. Analysis of the Excess Signal Estimation Error Theorem 3: Suppose assumptions fied. Then we have the following results. 1) For sufficiently small

and

are satis-

where , , and are uniform in . The and are as defined in Theorem 2. matrices exists, 2) For sufficiently small , and

(11) where the infinite series converges absolutely.

1877

The proof is given in Section IV-B. Part 1 of Theorem 3 provides a simple approximation to the excess signal estimation error. For small , the approximation is good in the transient phase as well as in steady state. If we assume (4) and (5), then where in the recursion (8) for , is replaced by . This simplified result is similar to that obtained by using [14, Theorem 4], where the error term is . Thus, for the case of a time-invariant parameter , we not only remove assumptions 4) and 5) made in [14] without any additional restrictive assumptions, but we also obtain a better rate for the error term. If we assume (4) and (5), then the limit in (11) simplifies to . This simplified result has been established in [15], and under the assumption of i.i.d. Gaussian it can also be derived from the exact expression for given and in [8]. In comparison, our result is valid even when are nonlinearly related, and the data is dependent. Equation (11) , where is the implies that for small , constant on the right-hand side of (11). For two specific examples, it was shown in [21] that the LMS algorithm asymptotically results in a smaller signal estimation error than the optimal Wiener filter . A similar result was also stated for one-step prediction of an AR process for a normalized LMS algorithm in [13]. The result (11) of Theorem 3 can be used to investigate this phenomenon under general conditions. Let denote the constant on the right-hand side of (11). If is , that is, negative, then for sufficiently small , the LMS algorithm gives a smaller signal estimation error than the optimal Wiener filter . As explained in [21], the reason for this is that the LMS algorithm, due to its recursive nature, utilizes all the past data, while the Wiener filter only uses the present data. Unlike [21] and [13], which only deal with specific . The examples, (11) holds for a large class of processes constant is always positive if (6) is true, or (4) and (5) are true. Hence, conventional analysis based on these assumptions does not indicate this phenomenon. In [31], lower bounds on the estimation error of causal estimators are derived using Kalman filtering. These bounds are of interest to lower-bound the signal estimation error of the LMS algorithm. Bounds based on causal Wiener filters are also given in [22]. Comparison of our results with these lower bounds is not feasible for the general case. However, we give a comparison for the example below for small . Example 4: Let mean, stationary Gaussian AR Let the desired signal be

, process with . For

, be a zero .

for Thus, if the ’s are sufficiently positively correlated, then the LMS algorithm outperforms the Wiener filter . For this ex. The ample, the lower bound in [31, Theorem 2.3] is is achieved by the smallest signal estimation error causal Wiener filter using two taps. For small , the LMS algorithm employing one tap gives the signal estimation error . As , , and hence choosing

1878

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

, for close to one, . Thus, for close to one, the LMS algorithm with one tap comes close to the optimal predictor, provided the step size is chosen appropriately. D. Analysis of the Measure of Transient Speed Consider the measure of transient speed

This is a measure of how fast converges to its steady. Since we sum over all , this measure also state value takes into account the transient phase of the algorithm. The smaller the value of , the faster is the speed of convergence. We have the following result. and are satTheorem 4: Suppose assumptions isfied. Then for sufficiently small , the measure of transient speed is finite and it satisfies (12) The proof is given in Section IV-C. , where is the eigenvector of Let responding to the eigenvalue . Then

previous sections, which, even for infinitesimal , depend on the dependence structure of the data. In [19], the following scheme is suggested for improving the in place of in speed of the LMS algorithm: use matrix obtained by throwing (3), where is an away the rows of an orthogonal matrix. Using the normalized measure of transient speed defined above, we show in the Ph.D. dissertation [29] that the speed of convergence increases for an obtained from the Karhunen–Loeve transform matrix (a matrix whose rows are eigenvectors of ). However, no single . orthogonal transformation improves speed for all Remark 1: Consider the measure of transient speed

which is based on the signal estimation error. Under assumpand , it can be shown that tions . For the normalized measure of transient speed, , . The highest value of this constant (see [30, eq. (1), p. 72]) is , and as in the case above, it coincides with the measure of speed proposed in [15].

cor-

Using a heuristic argument, [20] proposes a measure of transient speed (see [20, eq. (8)]

For small , behaves similar to . Thus, the analysis of the intuitively appealing measure of speed , leads to a rigorous justification of [20, eq. (8)]. Consider the normalized measure of transient speed

Since

The highest value of this constant (see [30, eq. (1), p. 72]), , corresponds to the slowest speed of convergence, and . For the LMS algoin this case, for small , rithm, [15, Theorem 1] implies that the speed of convergence is . Thus, the measure of speed inversely proportional to proposed in [15] corresponds to the worst case previously mentioned. Unlike [15], we do not assume (4) and (5). Also, we provide a rigorous proof, while the proof in [15, Theorem 1] is based on the independence assumption. depends only on and The constant . We get the same constant if we assume and to be i.i.d. Thus, for small , the transient speed is and correlation amongst the not affected by the presence of ’s. Hence, analysis based on the independence assumption [1] leads to the same result. This is in contrast to the results in the

IV. PROOF OF MAIN RESULTS Notation: To maintain simplicity of expressions, we define . By we denote the matrix product for , and for the empty product is to denote a be interpreted as the identity matrix. Let function such that

and let . By we denote an -dimensional column vector with at th position and otherwise, . We use Kronecker product of matrices and Lemma 13 summarizes the properties we need. Many of the basic properties mentioned in Lemma 13 are used frequently, and except for the first few instances, we do not refer to them. By simple calculations using (1) and (3), we get the recursion

Note that is and is . In our calculations we need which is obtained by repeated the following expression for application of the above recursion (13) where (14) through a We first study the expression series of lemmas, and then we prove our main results. Using (13)

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

where

1879

Note that in the preceding equation, and in the equations that is to be interpreted as follow, . Let the diagonal term ( ) be (15) (16)

(17) (18) Using Lemma 13 i), ii) we can write

(20) (19)

Here is

is

,

is

) be

and let the off-diagonal term (

, and

.

Lemma 1: If assumptions for sufficiently small

and

are satisfied, then (21)

Further, for sufficiently small ,

Lemma 3: If assumptions for sufficiently small

and

and

are satisfied, then

The proof is given in Section V-A. Lemma 2: If assumptions for sufficiently small

Further, for sufficiently small , and

and

are satisfied, then

,

Further, for sufficiently small , exists and ists,

ex-

, The proof is given in Section V-B. We have the following result for the off-diagonal term Lemma 4: If assumptions for sufficiently small

and

.

are satisfied, then

The proof is given in Section V-D. requires more decomposition. From (14) The analysis of and (18)

Further, for sufficiently small , exists, and ists,

The proof is given in Section V-C.

ex-

1880

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

B. Proof of Theorem 3

A. Proof of Theorem 2 where is an -dimenSuppose we choose th position and sional column vector with at otherwise. Here, and take values from . Then, . Thus, by choosing dif. Also, ferent values of , we can get all the entries in , . Hence, we can apply since , we obtain the result of Lemmas 1–4. Using , where

From the definition of . Hence,

,

, and

, we get

To prove Theorem 3, we derive the contribution of each of these terms separately. Lemma 5: If assumptions for sufficiently small

and

are satisfied, then

Further, for sufficiently small , ists and

ex-

(22) where the infinite series converges absolutely. Let

The proof is given in Section V-E. also contributes to the asymptotic estiThe cross term mation error and we have the following lemma. (23)

where

Lemma 6: If assumptions for sufficiently small

and

are satisfied, then

is as defined in Theorem 2. Therefore, Further, for sufficiently small , ists and

For sufficiently small , and by Lemconstant for some . Hence ma 12 vi), , and it follows that

ex-

where the infinite series converges absolutely. The lemma is proved in Section V-F. Theorem 3 now follows directly from Lemmas 5 and 6. C. Proof of Theorem 4

It is easy to see that satisfies the recursion (8). For suffiare in and ciently small , the eigenvalues of hence

(24) is well-defined. From (8) it follows that

(25) Also from Lemmas 1–4, sufficiently small and Further,

exists for is well-defined. and so

If we choose

, then by Lemma 13 iii), . Now consider

Note that in we do not take the absolute value of and hence we can separately calculate the contribution , , , and to . From Lemmas 1–4 and of . Since Lemma 13 vi), we get that, , we get (26) In Section V, the proof of Lemmas 1–4 yields Corollary 3 which shows that (27)

Hence, from (25) we get that .

satisfies the Lyapunov equation Equation (12) now follows from (26) and (27).

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

V. PROOF OF LEMMAS

1881

By (59) of Lemma 11, for sufficiently small

Notation: We define and . With these definitions and . For a vector , denotes the th component of . , denotes the th compoFor a vector function . nent function. Indicator function of a set is denoted by and for a matrix , is the corresponding we denote the Banach space of meainduced norm. By such that surable functions

where

,

we get

and (29)

(28) , a fixed vector Note that for the constant function , , and hence, it belongs to . If in is a bounded operator on , then by we denote the . By we denote the operator norm induced by function obtained by the action of the operator on the func. By , a fixed vector in , we mean that tion is acting on the constant function . the operator we refer to a family of operators such that By constant . In many of the inequalities we use “constant” to refer to a constant that does not depend involved in the inequalities. on the parameters This section is organized as follows. Lemma 1 is proved in Section V-A. The proof of Lemma 2 follows the same main steps as that of Lemma 4 but the details are much simpler. Hence, we first prove Lemmas 3 and 4 in Sections V-B and V-C, respectively. Lemma 2 is proved in Section V-D. Lemma 5 is proved in Section V-E and Lemma 6 is proved in Section V-F. Equation (27) is proved in Section V-G and Corollaries 1 and 2 are proved in Sections V-H and V-I, respectively. All the proofs follow the following main theme:

(30) (31) and

are bounded operators on constant

(32)

: By Lemma 11, we are free to choose Contribution of . We choose for reasons which become clear in and by Lemma Corollary 3. Thus, , . 13 iv) it has eigenvalues all the eigenvalues of are in and hence For . Consider for sufficiently small ,

where p. 107]

and

. By [30, eq. (11),

• split the term under consideration into further terms by applying Lemma 11 and deal with each of these terms separately; , identify the • for the term under consideration, say order of the term as a function of , show that exists for sufficiently small , and show exists; that .

• study

as Hence,

Contribution of

We frequently use the preliminary results stated in Lemmas 12 and 13, and except for the first few instances, we do not refer to them.

Now by (28),

: From (30)

a), and (32)

A. Proof of Lemma 1 By Lemma 11,

defined by constant

is a bounded operator on . Also from (19), using stationarity and Lemma 12 vii)

such that

. Hence, Therefore, for sufficiently small . Consider

and

constant constant

as

1882

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

Contribution of the same steps as

: The analysis of follows exactly above and we get , for sufficiently small , and

constant constant Contribution of

Lemma 1 follows directly from the analysis of . and

,

as : From (34)

, Using the Cauchy–Schwarz inequality for the inner product

B. Proof of Lemma 3 Using stationarity in (20) followed by a change of variable we get

By the definition of

and Lemma 12 iv) we obtain

constant As in Section V-A, we can apply Lemma 11. Using (58) we get , where (33)

constant Hence and exists for sufficiently small ; the series involved converges ab. Consider solutely. Further,

(34)

(35) constant and and fying (32). Contribution of and hence in and Lemma 12 iv)

are bounded operators on

satisconstant

: For

the eigenvalues of is well defined. Using

are

as

Contribution of : The proof for follows exactly the and we get , same steps as exists for sufficiently small , and

Hence using Lemma 3 follows directly from the analysis of .

,

and

C. Proof of Lemma 4 Thus,

and

exists. Consider

given by (21) can be split into two The off-diagonal term , in which the inner sum in (21) is over , terms: , in which the inner sum in (21) is over . Using and we get stationarity in

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

Substituting

in place of

1883

we get

(42) for Note that unlike in Section V-A, we have chosen convenience. We deal with each of these terms separately. The proof is very long and we split it into three subsections. Lemma 4 then follows directly from Lemmas 7–10 proved in the following subsections. : 1) Analysis of where we have used Lemma 13 ii) and (36) From (40), we can write

where

Now by changing the order of summation and substituting in place of we get,

(43)

(44) (37) Lemma 7: If assumptions for sufficiently small

By similar calculations

and

are satisfied, then

(38) where (39)

Further, for sufficiently small , exists, exists, and

and , in order to study we only Due to similarity of . We provide a proof for , and the proof for need to study follows almost identically. From (37), using Lemma 12 vii) Remark 2: For the

From (58) of Lemma 11 we get, where

term in

we have

Proof: Let

(40)

where are -dimensional column vectors. Note that does not depend on . Substituting in (43) for from (36)

(41)

(45)

1884

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

In the last step we have used the definition of Kronecker product and we have split each -dimensional vector into vectors of dimension . The argument is independent of and it suffices to consider the th term only. Let

Similarly, term is uniform in . It follows that

where the

where and are uniform in . So , exists (the series involved converges absolutely), and where is uniform in . Now consider

Contribution of

: By Lemma 12 vi) constant

Hence,

From the analysis of

as and it follows that exists for sufficiently small and

Now consider Further

constant

Using Lemma 12 vi)

and hence pend on ). Also,

exists (note that

does not de-

as Contribution of

: Let In the proof of Lemma 1 in Section V-A, we showed that

Hence

. Note that Also

by Lemma 12 ii). Thus, defined by

Further, by (2), follows that

. Using the operators and

. Hence, by (62) of Lemma 11, it The norm of the second term is

By Lemma 12 vi) we know that

By (2), the first term is zero. Also,

constant

and hence

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

It follows that,

where and

1885

. Hence, . Using the same bounds constant constant

as

Thus, has been dealt with. Repeating the same steps as for and small . Also,

, we get for sufficiently

This completes the proof. Lemma 8: If assumptions and are satisfied, and then for sufficiently small , . Further, . Proof: Now let where are -dimensional column vectors. Following exactly the same steps as in the proof of Lemma 7 and considering only the th term

Since the argument did not depend on , lyzed and the proof of Lemma 8 is complete. 2) Analysis of

has been ana-

:

Lemma 9: If assumptions then for sufficiently small , exists. Further,

and

are satisfied, and and

Proof: From (41)

Using (60) of Lemma 11 with , where

, where

We need the following bound:

For

we have the following bound: constant constant

Hence,

where the constant does not depend on , , . We write

where , vectors. Using (36)

constant constant

(46)

are -dimensional column

1886

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

Consider the th term

By Lemma 12 iii), above bound, fore, vergence as

In this case

which is finite by Lemma 12 ii) and (46). Hence, constant

and hence, for the is finite for sufficiently small . There. Further, we can use dominated conin (52) if we show that exists almost surely. But

and (47)

where the constant does not depend on , , , . Thus, . Using (60) of Lemma 11 with we get , where (48)

constant

(49) since

a.s. almost surely by

a). Thus,

(50) is assumed only for instead of Remark 3: In [14], . We have stated for (which implies for ). This allows us to apply Lemma 11 on the space as above. If this assumption is not made, then in order to apply . This can be Lemma 11 we need guaranteed only under the restrictive condition of bounded data.

is well-defined almost surely and Applying dominated convergence to (52) as

Thus,

constant

is well-defined. Using constant

By orthogonality (2) and Lemma 12 iii)

Let (47)

. Then by constant

constant

(51) Remark 4: The refinement of [14, Theorem 2], Lemma 11, exists. If instead permits us to show that and [14, Theorem 2] is used, we only get that nothing can be said about the limit.

From (49)

(52) Consider

constant

Consider

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

where

. So

1887

D. Proof of Lemma 2 Now consider and Lemma 13 i), ii) we get

. Substituting (14) in (16) and using

constant For

(53) where

(54) constant

is , Here Similarly, we have

Similarly, for

is

, and

is

.

(55) where constant (56)

Using these bounds constant

constant

as

Thus, has been dealt with. is similar to that of . We The analysis of exists for sufficiently small , the limit get that , and

Thus, has been dealt with and hence has been dealt with. Since the argument did not depend on , the proof of the lemma is complete. 3) Analysis of : and are satisfied, Lemma 10: If assumptions and then for sufficiently small , exists. Further, and

The proof is very similar to that of Lemma 9 and is not be presented here. The interested reader is referred to the Ph.D. dissertation [29].

and , given by (53) and (55), respectively, are similar and they can be treated in the same way. As can be seen from (53), corresponds to the term in (54), (36), and (37), except that is replaced by . Therefore, the main steps in the proof are similar to those in the proof of Lemma 4. But overall, the proof is simpler due to the absence of the summation present in (37). Since the proof does not involve any new ideas, we do not present it here. The interested reader is referred to the Ph.D. dissertation [29]. E. Proof of Lemma 5 Using Lemma 13 i) and the fact that

Note that since by assumptions

, b) and

depends only on

a). By Markovianity

Thus, which is similar to the expression considered in Section IV. In this case, and hence from Lemmas 1–4

1888

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

By orthogonality (2) zero. Further

and hence the first term is const.

Using , (22), and , . it follows that As shown in the proof of Theorem 2 in Section IV-A, , . Hence, we obtain . that exists for sufFrom Lemmas 1–4, ficiently small . In the proof of Theorem 2 in Section IV-B, we exists and . showed that above and using Using the expression for , we get

By the same argument, is bounded by a constant that does not depend on for sufficiently small . and for Hence, sufficiently small . We now obtain the contribution due to . Using stationarity followed by a change of variable

By Lemma 12 vi),

. Hence,

This completes the proof. F. Proof of Lemma 6 By Markovianity and with

Using (13)

Using (14), we can write

as in case of

above

where

where we are using the notation of Lemma 11. Note that . Hence, by (62) of Lemma 11 We first show that does not contribute to for sufficiently small . Since using Markovianity and then stationarity

,

By (2), the first term is zero. Now where

. By Lemma 12 ii), the function . Also, by [14, Lemma 3], the conditional expec. Hence and tation is a bounded operator on . Thus, we obtain we can apply (61) of Lemma 11 with

Similarly

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

Hence,

where . With made, . the choice of Hence from Lemma 13 i), iii) we obtain, . Further from the proof of Lemma 1 in Section V-A

It follows that . From the analysis of

1889

exists and that and

, we get that

where we have used Lemma 13 vi). Hence, and

exists for sufficiently small . Further

The desired result now follows from (57). H. Proof of Corollary 1 From Theorem 2 it follows that, where . Multiplying both sides by and . Using taking the trace, and the definition of , it follows that

From Lemma 12 vi) we can write

This completes the proof of the lemma. G. Proof of (27) Corollary 3: Under assumptions

Note that in the last step we have used the fact that the infinite converges absolutely. The desired result follows series in again. by using

and

I. Proof of Corollary 2 Proof: Choosing Lemma 13 iii), from Section IV we get

and then using

Here , , , and are as in Section IV but with defined in this section. Hence,

as

VI. CONCLUSION

and therefore,

From Lemmas 1–4 we get (57) From the proof of Lemma 1 in Section V-A we have

are in For sufficiently small , the eigenvalues of . Hence from (23) we obtain, where is uniform in . Hence, from Theorem 2, we obtain . Using this and (7), it follows that that

In this paper, we analyzed the LMS algorithm in the transient phase as well as in steady state. We provided simple approximations for the estimation error in the filter coefficients, and for the excess signal estimation error. For sufficiently small , we also proved the convergence of the LMS algorithm in the sense that the second-order statistics attain limiting values as the number . Further, we anaof iterations increases to infinity . The result for the excess lyzed the steady-state errors as signal estimation error shows that for sufficiently small , the LMS algorithm can result in a smaller signal estimation error than the Wiener filter . We also studied a measure of transient speed for small step size . Our analysis shows that for small , the transient speed does not depend on the signal estiand the correlation amongst the ’s. mation error

1890

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

Our result can also be used to analyze the scheme suggested in [19] for improving the speed of convergence. Unlike many of the previous works, we do not assume a linear regression model. We also consider dependent data. Our assumptions permit many data processes of practical importance. APPENDIX I PRELIMINARY LEMMAS

Proof: Equations (58)–(62) follow directly from Lemma 16, and we only have to verify the assumptions of Lemma 16. First we note that by , is -uniformly ergodic, and hence by [25, Lemma 15.2.9], it is also -uniformly er. Consider godic. For a matrix let

const.

constant

In this section, we first prove an exponential convergence result for certain products of random matrices, which are encountered in the analysis of the LMS algorithm. Lemma 11: Suppose

is a bounded operator on and Further,

,

a) and

, where the operator

constant constant constant

b) hold. Then

. satisfies

constant

From these bounds and b), assumption (64) of Appendix and are bounded operators II follows. The fact that follows from Lemma 14. The extra assumption that the matrices and be symmetric is satisfied as is symmetric. We also need to verify that the eigenvalues of these matrices are strictly negative. This follows from the assumption that is positive definite. We now state a few preliminary results.

(58)

Lemma 12: Assumptions lowing.

and , for

i)

, , is any matrix and for suffiwhere , ciently small , the family of operators on satisfies constant , and the family satisfies . Similarly,

is a bounded operator on

as well as

. .

ii)

(59)

imply the fol-

. .

iii) iv)

.

v) vi) vii) For

, and ,

constant . ,

where

, and

and it satisfies

(60)

Proof: Taking expectation on both sides of a) we obtain i). using

and hence, (61) ii) now follows from , , and the operators and are as described in the notation at the beginning of , then Section V. Also, if

c). Similarly

where

(62)

Hence iii) follows from i) and

d). Similarly

b) and

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

By i),

by

. Since

d). By

vii) For

1891

symmetric and invertible

c) Properties i)–iv) can be found in [30] and the references therein. v)–vii) are proved in the Ph.D. dissertation [29].

constant

APPENDIX II ANALYSIS OF PRODUCT OF CERTAIN RANDOM MATRICES

which is finite by i). Thus, iv) follows. v) follows because

which is finite by assumption By [25, eq. (16.17), p. 388]

c).

constant

(63)

, where a). By ii), in assumption constant and, hence, By orthogonality (2), and by setting applying (63)

and

is as for some . . Therefore, and

,

constant

In this appendix, we prove Lemma 16 which not only refines [14, Theorem 2], but also establishes a new result. We assume to be -uniformly ergodic and conthe Markov process sider products of random matrices which are functions of . The framework in this section is more general than that for the LMS algorithm. In [32] and [33], a multiplicative ergodic theorem has been established for bounded functions of geometrically ergodic Markov processes. Lemma 16 gives a multiplicative ergodic theorem for certain matrices which are in general unbounded functions of an ergodic Markov process. In addition, this result also gives us control over the error term, which is critical for performance analysis of the LMS algorithm. and the operator Consider the space

and the desired result follows. vii) follows from Lemma 14 of Appendix II with where and We only have to verify the assumptions of Lemma 14: and of Appendix II. is part of the hypothesis while has been verified in the proof of Lemma 11 above.

, are such that

matrices of real-valued

constant

Lemma 13: i) ii)

(64)

. Lemma 14: Suppose (64) hold. Then, for any

iii)

where

and

are

-uniformly ergodic and let and for all

(65) Further,

,

is

-dimensional

vectors and

where

Here , measurable functions on

is a bounded operator on

and,

.

denote the eigenvalues of with eigenvectors , . Then has eigenvalues with eigenvectors , . , be the eigenvalues of with v) Let has eigenvalues eigenvectors . Then, with eigenvectors , . vi) For symmetric, invertible and

(66) (67)

iv) Let

Proof: Let

. For simplicity of notation, let . We get

1892

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

Due to space constraints we keep the proofs short. More detailed proofs are given in the Ph.D. dissertation [29]. Lemma 15: Suppose is -uniformly ergodic and let is symmetric (64) be true. In addition, suppose and has nonzero eigenvalues with corresponding eigen-pro, , , for . Then, jection , such that, the operator has the decomposition , . The operator satisfies constant

The step above follows from the Markov property, inequality follows from the definition of , inequality follows follows by repeatedly applying from (64), and inequality inequality . This completes the proof of (65). Consider

For the operator , the eigenvalues , , the , and the corresponding corresponding eigen-projections eigen-nilpotents satisfy (68) (69) (70) Remark 5: Equations (68) and (70) are improvements of [14, eqs. (14) and (16)] while (69) is the same as [14, eq. (15)].

constant constant Using (64) and the definition of a bounded operator for . bounded operator on Finally, we prove (67). Since

By (65), the above expression is equal to

, it follows that , and hence

is is a

and hence by Markovianity

Proof: The decomposition of the operator follows as in [14]. We now establish (68). From the proof of [14, Theorem 1] has nonzero eigenvalues , we have . We show in what follows that under the additional assumption of symmetric , are semisimple eigenvalues of . Equation (68) then follows from [34, eq. (2.41), p. 82]. To prove that the eigenvalues are semi-simple consider

Since

is symmetric, by spectral decomposition

Using where we have used have shown (67) for it can be proved for

. Thus, we . In the same manner, using induction, .

Reference [14, Theorem 2] establishes an exponential converand thus provides a tool to deal with prodgence result for ucts of random matrices like those considered in (67). In order to derive a refinement of [14, Theorem 2], we prove a lemma that improves [14, Theorem 1] under an additional assumption. But first we introduce some more operators. The expectation on . It defines a bounded operator is shown in the proof of [14, Theorem 1] that is a projection onto the subspace of constant functions which is a finite-dimen. For any deterministic matrix , sional subspace of . Since expectation and conditional expectation com. We use to denote the identity mute, we have . It is assumed that the reader is familiar with operator on the proofs of [14, Theorem 1] and [14, Theorem 2]. For the definitions of eigen-projections, eigen-nilpotents, and semisimple eigenvalues we refer the reader to [34, p. 41].

[34, p. 36]) of tion

and

, it is easy to show that , that is, is the resolvent (see . Using the spectral decomposi-

Further, the range of is finite dimensional for and hence , are eigenvalues of with corresponding eigen-projections (see [34, p. 181]). The corresponding eigen-nilpotent

and hence , are semisimple eigenvalues and the proof of (68) is complete. Equation (69) is the same as [14, eq. (15)]. We now prove (70). Let

DABEER AND MASRY: ANALYSIS OF MEAN-SQUARE ERROR AND TRANSIENT SPEED OF THE LMS ADAPTIVE ALGORITHM

where we have used (69). Since and are eigen-projecand tions and eigen-nilpotents, (see [34, p. 39]). Further, the eigen-projections commute with each other and with . Hence,

1893

Finally, we prove (74). Since, . Using and

,

,

we get that (75)

(71) .

Now using (68) and (69) it can be shown that , the desired result follows. Since We now prove the main result of this appendix.

is -uniformly ergodic and let Lemma 16: Suppose is symmetric (64) be true. Also, assume that , , , and its eigenvalues for . Let where is any matrix. Then, for sufficiently small

Using , we obtain, constant easily shown that

, for some

, and . It is shown in [14] that . Using this it can be (76)

Using

, (see [14]),

, (75), (76), and

(72) (73) (74) where

,

and for sufficiently small constant

and

const

Remark 6: Equations (72) and (73) refine [14, Theorem 2]. Equation (74) is new. Proof: From the proof of [14, Theorem 2, p. 595], to prove (72), we only need to show that

By [14, Equation (17)] for and Lemma 15

, spectral decomposition for

where is the algebraic multiplicity of as an eigenvalue , of . Using Lemma 15 and using the assumption that it is shown in [29] that each of the above terms is of order . Equation (73) can be proved in exactly the same way as [14, eq. (19)] but by using Lemma 15 instead of [14, Theorem 1].

This completes the proof. REFERENCES [1] S. Haykin, Adaptive Filter Theory, 3rd ed, ser. Information and System Sciences Series. Englewood Cliffs, NJ: Prentice Hall, 1996. [2] B. Widrow, J. M. McCool, M. G. Larimoore, and C. R. Johnson, “Stationary and nonstationary learning characteristics of the LMS adaptive filter,” Proc. IEEE, vol. 64, pp. 1151–1162, Aug. 1976. [3] R. R. Bitmead and B. D. O. Anderson, “Performance of adaptive estimation algorithms in dependent random environments,” IEEE Trans. Automat. Contr., vol. AC-25, pp. 788–794, Aug. 1980. [4] R. R. Bitmead, “Convergence in distribution of LMS-type adaptive parameter estimates,” IEEE Trans. Automat. Contr., vol. AC-28, pp. 54–60, Jan. 1983. [5] O. Macchi and E. Eweda, “Second-order convergence analysis of stochastic adaptive linear filtering,” IEEE Trans. Automat. Contr., vol. AC-28, pp. 76–85, Jan. 1983. [6] , “Convergence analysis of self-adaptive equalizers,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 161–176, Mar. 1984. [7] W. Gardner, “Learning characteristics of stochastic gradient-descent-algorithms: A general study, analysis and critique,” Signal Processing, vol. 6, no. 2, pp. 113–133, Apr. 1984. [8] A. Feur and E. Weinstein, “Convergence analysis of LMS filters with uncorrelated Gaussian data,” IEEE Trans. Acoust., Speech. Signal Processing, vol. ASSP-33, pp. 222–229, Feb. 1985. [9] N. J. Bershad and L. Z. Qu, “On the probability density function of the complex scalar LMS adaptive weights,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 43–56, Jan. 1989. [10] V. Solo, “The limiting behavior of LMS,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 1909–1922, Dec. 1989. [11] , “The error variance of LMS with time-varying weights,” IEEE Trans. Signal Processing, vol. 40, pp. 803–813, Apr. 1992. [12] J. Bucklew, T. Kurtz, and W. Sethares, “Weak convergence and local stability properties of fixed step size recursive algorithms,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 966–978, May 1993. [13] J. J. Fuchs and B. Delyon, “When is adaptive better than optimal?,” IEEE Trans. Automat. Contr., vol. 38, pp. 1700–1703, Nov. 1993. [14] G. V. Moustakides, “Exponential convergence of products of random matrices: Application to adaptive algorithms,” Int. J. Adapt. Contr. Signal Processing, vol. 12, pp. 579–597, Nov. 1998. [15] , “Locally optimum adaptive signal processing algorithms,” IEEE Trans. Signal Processing, vol. 46, pp. 3315–3325, Dec. 1998. [16] L. Guo and L. Ljung, “Performance analysis of general tracking algorithms,” IEEE Trans. Automat. Contr., vol. 40, pp. 1388–1402, Aug. 1995. [17] L. Guo, L. Ljung, and G. Wang, “Necessary and sufficient conditions for stability of LMS,” IEEE Trans. Automat. Contr., vol. 42, p. 761, June 1996. [18] A. H. Sayed and M. Rupp, “Error-energy bounds for adaptive gradient algorithms,” IEEE Trans. Signal Processing, vol. 44, pp. 1982–1989, Aug. 1996.

1894

[19] N. Erdol and F. Basbug, “Wavelet transform based adaptive filters: Analysis and new results,” IEEE Trans. Signal Processing, vol. 44, pp. 2163–2171, Sept. 1996. [20] J. Homer, R. R. Bitmead, and I. Mareels, “Quantifying the effects of dimension on the convergence rate of the LMS adaptive FIR estimator,” IEEE Trans. Signal Processing, vol. 46, pp. 2611–2615, Oct. 1998. [21] M. Reuter and J. R. Zeidler, “Nonlinear effects in LMS adaptive equalizers,” IEEE Trans. Signal Processing, vol. 47, pp. 1570–1579, June 1999. [22] K. J. Quirk, L. B. Milstein, and J. R. Zeidler, “A performance bound for the LMS estimator,” IEEE Trans. Inform. Theory, vol. 46, pp. 1150–1158, May 2000. [23] A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximation. New York: Springer-Verlag, 1990. [24] H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. New York: Springer-Verlag, 1997. [25] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability. London, U.K.: Springer-Verlag, 1993. [26] R. C. Bradley, “Basic Properties of strong mixing conditions,” in Dependence in Probability and Statistics. Boston, MA: Birkhauser, 1986, pp. 165–191.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002

[27] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods, 2nd ed, ser. Springer Series in Statistics. New York: Springer-Verlag, 1991. [28] O. Macchi, Adaptive Processing—The Least Mean Squares Approach With Applications in Transmission. New York: Wiley, 1995. [29] O. J. Dabeer, “Convergence analysis of the LMS and the constant modulus algorithms,” Ph.D. dissertation, Univ. California, San Diego, 2002. [30] H. Lütkepohl, Handbook of Matrices. New York: Wiley, 1996. [31] R. Ravikanth and S. P. Meyn, “Bounds on the achievable performance in the identification and adaptive control of time-varying systems,” IEEE Trans. Automat. Contr., vol. 44, pp. 670–682, Apr. 1999. [32] S. Balaji and S. P. Meyn, “Multiplicative ergodic theorems and large deviations for an irreducible Markov chain,” Stochastic Processes Their Applic., vol. 90, no. 1, pp. 123–144, 2000. [33] I. Kontoyiannis and S. Meyn, “Spectral theory and limit theorems for geometrically ergodic Markov processes,” paper, submitted for publication. [34] T. Kato, “Perturbation theory for linear operators,” in Classics in Mathematics. Berlin, Germany: Springer-Verlag, 1995, reprint of 1980 edition.