IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011
A Theory on the Convergence Behavior of the Affine Projection Algorithm Seong-Eun Kim, Jae-Woo Lee, and Woo-Jin Song
Abstract—In this paper, we present a theoretical convergence analysis of the affine projection algorithm (APA) based on the arguments of energy conservation. Although the APA and its convergence analysis have been widely studied, the dependency of weight-error vector on past noise is usually neglected for simplicity. To obtain accurate theoretical results for the APA, we here consider the dependency between the weight-error vector and past noise in the mean-square analysis presented by Shin and Sayed in [“Mean-square performance of a family of affine projection algorithms,” IEEE Transactions on Signal Processing, vol. 52, no. 1, pp. 90–102, January 2004]. Through this work, we can also theoretically analyze the behavior of the periodic APA, which updates its weights periodically. Simulation results show that our theoretical results coincide closely with simulations. Index Terms—Adaptive filter, affine projection algorithm, energy-conservation, mean-square performance analysis.
6233
model for the input signal to simplify the analysis of the APA. The convergence analysis of DA [10] assumes a Gaussian autoregressive input model and unit step-size. In [11], Shin and Sayed recently provided a unified analysis of the convergence performance of the APA family including the R-APA, the BNDR-LMS, the NLMS-OCF, and the partial rank algorithm (PRA) [12], which used energy conservation arguments [13]–[16] without making any assumptions about the input signals. Despite the significance of the unified analysis, there are some shortcomings. First, the dependency of the weight error vector at iteration 0 1 on the current noise vector is neglected by assuming that the weight error vector is uncorrelated to past noise in [11]. In the APA, however, the weight error vector at iteration 0 1 can be expressed as a linear combination of previous noise vectors from iteration 0 1. Thus, the weight error vector is actually correlated with the current noise vector, although all noise sequences are assumed to be mutually independent. Neglecting this correlation is only valid in the PRA [12], iterations and the weight because weights are updated once every error vector is expressed with previous noise vectors prior to iteration 0 . As a result, Shin and Sayed concluded that the APA and the PRA had the same theoretic steady-state mean-square deviation (MSD) in the conventional analysis although the PRA had lower MSD than the APA in reality. In [17], Paul and Ogunfunmi present the effect of the correlation between weight error and past noise to analyze the convergence behavior of the APA based on the analysis [9] assuming Slock’s model for the input signal. This work provides an interesting approach for finding the weight error and past noise dependence. The purpose of this paper is to analyze the convergence behavior of the affine projection algorithm (APA) more accurately based on [11] by considering the effect of the correlation between the weight-error vector and past noise using energy conservation arguments; any assumption for input signals is not required. Our work shows that neglecting the effect reduces the analysis of the APA in [11] to the analysis of the PRA. We also provide a more precise description of the APA’s mean-square behavior. Simulations illustrate that the proposed theoretical results match practical results better than [11]. From the results, we can also analyze the mean-square performance of the periodic APA (P-APA) [18], which updates weights once every iterations (the PRA is a special case of = ). Through simulations, we find that the P-APA has a slower convergence rate and lower steady-state error as increases. Throughout the paper, we adopt the following notations: k 1 k is the Euclidean norm of a vector; ( 1 )3 denotes Hermitian transposition; T ( 1 ) is the transpose of a vector or a matrix; [ 1 ] is to take expectation; max ( 1 ) is the largest eigenvalue of a matrix; and Tr( 1 ) is the trace of a matrix. The symbol denotes the identity matrix of appropriate dimensions. All vectors are column vectors except for the input vector, which is taken to be a row vector for convenience of notation. This paper is organized as follows. Section II reviews the convergence analysis of the APA using energy conservation arguments. Section III provides a theoretic mean-square performance analysis of the APA considering the dependency of the weight-error vector on past noise. From the result, we also derive a convergence analysis of the P-APA. Section IV contains simulation results which show the proposed theoretic results describe the convergence behavior of the APA and the P-APA well. Finally, we present our conclusions in Section V.
i
i
i
K
i K
I. INTRODUCTION The normalized least mean-squares (NLMS) algorithm is one of the most popular adaptive algorithms due to its low computational cost and ease of implementation. However, correlated input signals reduce its convergence speed considerably [1]–[3]. To address this problem, the affine projection algorithm (APA) has been proposed [4]. This algorithm updates the weights based on the input vectors in order to improve the convergence speed last of LMS-type filters for correlated input signals. The APA is also viewed as a generalization of the NLMS because the NLMS is the same as a one-dimensional APA ( = 1) [2], [3]. In the APA, the matrix inversion requires a regularization parameter to avoid numerical problems, so this algorithm is called a regularized APA (R-APA) [3]. The performance of the APA especially depends on the number of input vectors. With more input vectors involved, the convergence speed improves, but the steady-state error gets worse [3], [5]. At the beginning, the APA was analyzed from the viewpoint of the NLMS-type filters such as the binormalized data-reusing LMS (BNDR-LMS) [6], the decorrelating algorithm (DA) [7], and the NLMS with orthogonal correction factors (NLMS-OCF) [8]. The mean-square performance of the BNDR-LMS algorithm was analyzed in [6], which provided the theoretic results for only the second order APA. In [9], the NLMS-OCF was analyzed based on a particular
K
K
p
Manuscript received April 12, 2011; revised August 07, 2011; accepted August 31, 2011. Date of publication September 19, 2011; date of current version November 16, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Maciej Niedzwiecki. This work was supported by the Brain Korea 21 Project in 2011, Korea. S.-E. Kim is with the Educational Institute of Future Information Technology, Pohang University of Science and Technology (POSTECH), Pohang 790-784, Korea (e-mail:
[email protected]). J.-W. Lee and W.-J. Song are with the Division of Electrical and Computer Engineering, Pohang University of Science and Technology (POSTECH), Pohang 790-784, Korea (e-mail:
[email protected];
[email protected]. kr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2011.2168524
p
p K
1053-587X/$26.00 © 2011 IEEE
E
I
6234
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011
II. REVIEW OF CONVERGENCE ANALYSIS FOR THE APA Consider a system identification of a known input signal ui and a desired signal d(i) that originates from an unknown linear system
3 ~ i . To represent (6) in terms of the weight-error 2 ~ i k6 ~ i 6w where kw w vector itself, we first substitute (4) into (6) as follows:
kw~ k62 = kw~ 01 k62 + 2e3 (I + U U 3 )01 2 U 6U 3 (I + U U 3 )01 e 0 e36 (I + U U 3 )01 e 0 e3 (I + U U 3 )01 e6 : i
d(i) = ui w
o
+ v (i)
(1)
where w is an unknown column vector with length M to be identified with an adaptive filter, v (i) corresponds to measurement noise with variance v2 , and ui denotes a row input (regressor) vector as follows: o
ui = [u(i) u(i 0 1)
111
u(i 0 M
+ 1)]:
Let us define wi as an estimate for w at iteration i, and we obtain the update equation of the APA such as o
wi = wi01 + Ui3 (I + Ui Ui3 )01ei
(2)
where
d(i) d(i 0 1)
di =
;
.. .
Ui
=
d(i 0 K + 1)
ei
=
ui ui01 .. .
ui0K +1
The update (2) can be rewritten in terms of the weight-error vector o = w 0 wi as
~i w
3 3 01 ei : ~ i01 0 Ui (I + Ui Ui ) ~i = w w
(3)
~ i and e6 ~ i01 . If we multiply both sides of Let e6 p;i = Ui 6w a;i = Ui 6w the above recursion by Ui 6 from the left, for any Hermitian positivedefinite matrix 6, we find that the a priori and a posteriori weighted 6 ; e6 estimation errors fea;i p;i g are related via
e6p;i = e6a;i 0 Ui 6Ui3 (I + Ui Ui3 )01 ei : Assuming Ui 6Ui3 is invertible, we can rewrite (4) as 1
i
a;i
i
i
i
i
i
i
i
(7)
a;i
The error vector ei can be represented as ~ i01 + vi ei = Ui w
(8)
where viT = [v (i) v (i 0 1) . . . v (i 0 K + 1)]. By using the relation (8) and taking expectation, (7) can be rewritten as
=
A. Weighted Variance Relation
i
i
i
2 ~ i k6 E kw
;
size [4]. The choice = 0 results in the standard APA. For the periodic APA (P-APA) [18], wm = wi0p for m = i 0 1; . . . ; i 0 p + 1, i.e., the weight vector is updated once every p iterations where p is a positive integer (1 < p < K ). In particular, if p = K , the P-APA is reduced to the PRA [12]. Usually, the APA’s order K should be less than or equal to M [1]–[3].
3 01 ei =
i
i
i
(4)
2 2 3 ~ i01 ] ~ i01 k6 + E [w ~ i01 Pi 6Pi w E kw
2 3 3 2 3 ~ i01 Pi 6Ci vi ] + E [vi Ci 6Ci vi ] + E [w
di 0 Ui wi01 ; is a regularization parameter, and is a step
(I + Ui Ui )
i
2 3 3 ~ i01 ] + [v Ci 6Pi w
i
i
i
0 E [w~ i3016Ci vi ] 0 E [w~ i301Pi 6w~ i01 ] 0 E [v3i Ci36w~ i01 ]
(9)
where Pi Ui3 (I + Ui Ui3 )01 Ui and Ci Ui3 (I + Ui Ui3 )01 . For tractable analysis, measurement noise is assumed for analytical convenience as follows [11]. Assumption I: Noise v (i) is independent and identically distributed (i.i.d.), and statistically independent of the input matrix fUi g. We also introduce the weak independence assumption [11], as follows. ~ i01 is independent of Ui3 (I + Ui Ui3 )01 Ui . Assumption II: w By using the above assumptions and neglecting the dependency of ~ i01 on the past noise, (9) is reduced to w 2 ~ i k6 E kw
=
2 2 3 ~ i01 k6 + E [w ~ i01 Pi 6Pi w ~ i01 ] E kw
0 2E [w~ 301P 6w~ 01 ] + 2 E [v3 C 36C v ] 2 3 3 2 ~ 01 k6 + E [v C 6C v ] = E kw i
i
i
i
3 01 e6 0 e6 a;i p;i
0 E [w~ 3016P w~ 01 ]
i
i
i
i
i
i
i
i
(10)
~ i301 Pi 6w ~ i01 because 6 and Pi are Her~ i301 6Pi w ~ i01 = w where w mitian matrices, and
(Ui 6Ui )
and substitute the results into (3) to get
0
(5)
By evaluating the weighted Euclidean norms on both sides of (5), we find that
kw~ k62 + e36 (U 6U 3 )01 e6 i
a;i
i
i
a;i
=
kw~ 01 k62 + e36 (U 6U 3 )01 e6 i
p;i
i
i
0 2E [P ]6 + 2 E [P 6P ]: (11) Let us consider the vecf1g notation which replaces an M 2 M arbitrary matrix by an M 2 2 1 column vector or an M 2 2 1 arbitrary column vector by an M 2M matrix such that 6 = vec( ) and = vec(6). By 6 =6
3 3 01 6 3 3 01 6 ~ i01 + Ui (Ui 6Ui ) ep;i : ~ i + Ui (Ui 6Ui ) ea;i = w w
i
i
the notation, 0 is defined as 0 = vec(60 ). To represent 0 as a function of , we use the following property: For any matrices fP; 6; Qg of compatible dimensions, it holds that
p;i
(6)
i
vec
fP 6Qg = (Q P ) vecf6g T
(12)
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011
where A B denotes the Kronecker product of two matrices A and B . where the matrix F is M 2 2 M 2 and defined by Then, 0 = F
F I 0 2(I E [P ]) + 2 E P P : T i
i
(13)
i
6235
v v
and Tn = E [ i0n i3 ]=v2 . The detailed derivation is shown in Appendix II. Then, (14) is modified to
2 2 E kw~ kvec( ) = E kw~ 1 kvec( ) + 22 + 2 i
i0
T
v
F
K 01 n=1
~
T n
Then, (10) is reduced to
(19)
2 2 E kw~ kvec( ) = E kw~ 1 kvec( ) + 2 2 ( ) i
i0
T
v
F
(14)
where = vec(E [Ci Ci3 ]). Based on the result (14), the EMSE and MSD of the APA were derived in [11]. However, since the weight-error (3) can be rewritten as
If we drop the rewritten as
vec( 1 ) notation from the subscripts, (20) is simply
E kw~ k2 = E kw~ 1 k2 + 22 + 2 i
w~ = (I 0 U (I + U U ) 1 U )w~ 1 0 U (I + U U ) 1v = (I 0 P )w~ 1 0 C v ; (15) 3
i
3 0
i
i
i
:
i
i
i0
i
3
i0
i
i
3 0
i
v
v v w
0
is not independent of i , i.e., E [ i0l 3i ] 6= , for l = 1; . . . ; K 0 1. On the other hand, ~ i0K and the previous ones are independent of i . Thus, we cannot neglect the dependency of the weight-error vector on the past noise in the APA and the assumption is valid only for the PRA. i0l
v
F
T
v
K 01 n=1
~
:
T n
(20)
i
i
w~
i0
Through the above procedure, we can obtain the theoretical recursion of the convergence behavior for the periodic APA (P-APA) [18] with an update interval p for (1 < p < K ). The weight-error equation of the P-APA is given by
w~ = (I 0 P )w~ + C v : i
i
i
i0p
(21)
i
In that case, (16) is changed as follows:
III. ENHANCED CONVERGENCE ANALYSIS OF APA To obtain a more accurate performance analysis for the APA, we consider the neglected dependency in the performance analysis. Performing the recursion (15) K 0 1 times, we show in Appendix I that the result is given by
w~ = i0p
i0
(I 0 P ) w~
i0K
i0l
l=1
0
K 01
n01
n=1
l=1
(I 0 P ) C v i0n
i0l
i0n
:
(16)
) w~
i0K
n01
0
K 01
w~ 1 =
(I 0 P
i0lp
l=1
(I 0 P
i0lp
n=1
l=1
) C
i0np
v
(22)
i0np
where b1c is the floor function that maps a real number to the largest integer not greater than it. Then, substituting (22) into (17) while replacing ~ i01 with ~ i0p , we get
w
w
We can also rewrite the neglected terms in (9) as follows:
2 E [w~ 1 P 6C v ] 0 E [w~ 1 6C v ] + 2E [v C 6P w~ 1 ] 0 E [v C 6w~ 1 ] = 02E [w~ 1(I 0 P )6C v ]: 3
i0
i
i
3
3
i
3
i
i
i
i0
3
i0
3
i
0 2E w~ (I 0 P )6C v = 222 3
where
Substituting (16) into (17), the above neglected terms can be represented in terms of as
~
i
i
i
02E [w~ 1(I 0 P )6C v ] = 22 2 3 i0
i
i
i
v
i
i
v
~ T n;p
n=1
(23)
i0
i
(17)
3 i0
i
i0p
i
K 01 n=1
~ T n
(18)
n;p
and Tn;p
n01
= vec E = E [v
i0np
(I 0 P
i0lp
l=0
) C
i0np
T C n;p
T 3
i
v ]=2 . Thus, (20) is changed as 3
i
v
where
~ = vec E n
T
n01
(I 0 P ) C T C i0l
l=0
i0n
n
3
i
E kw~ k2 = E kw~ k2 + 2 2 + 2 i
i0p F
v
b
c
T
n=1
~
T n;p
:
(24)
6236
If p
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011
= K i.e., for the PRA, the recursion (24) is reduced to
w = Ekw~ 0
Ek ~ i k2
i
K
kF2 + v2 2 ( T )
(25)
which is the same with the theoretical recursion (49) in [11] if Ek ~ i0K k2 is replaced with Ek ~ i01 k2 .
w
w
A. Mean and Mean-Square Stability Taking expectation of (15) yields
w
w
E [ ~ i ] = (I 0 E [Pi ])E [ ~ i01 ];
(26)
so the convergence in the mean of the APA is guaranteed for any satisfying [11]
0 < < max (2E [P ]) :
(27)
i
In addition, the matrix F is given in (13), which is the same as that used in [11]. Thus, the range of to guarantee stability in the mean-square sense of the APA is given by [11]
1
0 < < min where C
max (C 01 D)
= 2(E [P ] I ); D = E [P T i
;
1 max((H ) 2