Least-Squares Adaptation of Affine Combinations of Multiple Adaptive Filters Luis A. Azpicueta-Ruiz∗ , Marcus Zeller† , Aníbal R. Figueiras-Vidal∗ , and Jerónimo Arenas-García∗ ∗ Department of Signal Theory and Communications Universidad Carlos III de Madrid, 28911 Leganés-Madrid, Spain Email: {azpicueta, arfv, jarenas}@tsc.uc3m.es † Multimedia Communications and Signal Processing University of Erlangen-Nuremberg, Cauerstr. 7, 91058 Erlangen, Germany Email:
[email protected] Abstract— Adaptive combinations of adaptive filters are gaining popularity as a flexible and versatile solution to improve adaptive filters performance. In the recent years, combination schemes have focused on two different approaches: Convex and affine combinations, developing principally practical implementations with just two component filters. However, combinations of a higher number of adaptive filters can offer additional advantages, mainly in tracking environments. In this paper, we introduce a practical adaptation scheme for the affine combination of an arbitrary number of filters, including a steady-state analysis where the proposed rule is compared with the optimal combination. Several experiments both in tracking and stationary scenarios serve to demonstrate the appropriate performance of this approach.
speed of changes, one of the components can clearly outperform the others. In this paper, a new practical adaptation scheme for the affine combination of multiple filters is presented, which can be considered as a generalization of [7]. It should be noted that convex combinations of multiple filters have already been considered in [10]–[11]. The paper is organized as follows: a description of the algorithm is provided in Sec. 2, as well as an analysis of the adaptation rule for the combination. Sec. 3 includes several experiments that show the performance of the proposed scheme both in stationary and tracking scenarios. The last section summarizes the conclusions of our work.
I. I NTRODUCTION
II. L EAST- SQUARES ADAPTATION OF A COMBINATION OF
Adaptive combinations of adaptive filters with different properties is becoming a very useful and flexible approach in order to alleviate the different compromises that condition the operation of adaptive filters [1]–[3]. Variable step size schemes have been traditionally used with this purpose, but they normally introduce several parameters whose appropiate tunning needs some a priori knowledge about the statistics of the filtering scenario. Recently, algorithms based on combinations of filters have found application in several areas of signal processing, including echo cancellation [4] and distributed estimation [5], among others. The key concept of combination schemes is that the overall filter behaves as well as the best of the contributing filters, and, under certain circumstances, even better [1]. Different schemes can be applied to mix the outputs of the component filters, including convex and affine linear combinations. Recently, theoretical comparisons between both kinds of combination schemes, in the case of two filter components, have shown that the affine combination can present some advantages in terms of overall filter performance in certain situations [2], [3]. These results demonstrate that the optimal combiner exhibits a negative mixing weight in a stationary scenario, allowing the affine combination to outperform the best of its components. The relation between affine and convex combinations has also been investigated theoretically in [6], extending the previous results and developing a formulation for the case of a combination of three filters. Up to now, several adaptation schemes have been proposed for the affine case, with particular focus on the case of just two contributing filters. For instance, in [2] a least-mean-square (LMS) adaptation scheme was proposed, and two different normalized versions of such approach were presented in [3]. We developed a different adaptation scheme for the mixing parameter based on the solution to a leastsquares (LS) problem [7]. However, combination of more than two component filters can provide additional advantages with respect to the combination of just two elements, mainly in tracking situations, where, depending on the
SEVERAL ADAPTIVE FILTERS
978-1-4244-5309-2/10/$26.00 ©2010 IEEE
A. Algorithm description Consider an affine combination of K adaptive filters which obtains the overall filter output at time n as y(n) =
K
λk (n)yk (n)
(1)
k=1
where yk (n), k = 1, ..., K, are the outputs of thecomponent filters and λk (n) are the mixing parameters satisfying K k=1 λk (n) = 1, that are adapted at each iteration to optimize the overall performance. In order to incorporate the affine constraint into the scheme, we will simply adapt the first K − 1 weights and consider that λK (n) = K−1 λk (n). 1 − k=1 For the adaptation of the K − 1 mixing parameters we will minimize the LS cost function J(n) =
n
β(n, i)e2 (n, i),
(2)
i=1
where β(n, i) is a temporal weighting window and e(n, i) = d(i) − y(n, i), d(i) being the desired signal at time instant i and K−1 K−1 λk (n)yk (i) + 1 − λk (n) yK (i) y(n, i) = k=1
=
yK (i) +
k=1 K−1
λk (n) [yk (i) − yK (i)]
(3)
k=1
represents the output of the overall filter when the outputs of the constituent filters at time i are combined by means of the weights at time n. Although an exponentially-weigthed window β(n, i) facilitates the formulation as a recursive LS (RLS) problem, allowing savings in terms of computational cost, the use of rectangular windows can provide some advantages as discussed in [7].
2976
In order to find the values of the mixing parameters that minimize the power of the global error, we take the derivates of (2) w.r.t. each weight λm (n) with m = 1, ... , K − 1, obtaining n ∂J(n) β(n, i)e(n, i)[yK (i) − ym (i)] =2 ∂λm (n) i=1
(4)
K−1 λk (n)[yK (i) − yk (i)] + eK (i), and the error where e(n, i) = k=1 of each contributing filter is denoted by ek (i) = d(i) − yk (i). Thus, setting (4) to zero, and after some manipulations, we get n K−1
β(n, i)λk (n)˜ yk (i)˜ ym (i) =
i=1 k=1
n
β(n, i)eK (i)˜ ym (i)
(5)
i=1
where y˜m (i) = ym (i) − yK (i). Since we have a K − 1 equations system with the form of (5), at this point, matrix notation results more convenient, yielding P(n)λ(n) = z(n)
(6)
where P(n) is a symmetric matrix of size K − 1 with components [P(n)]k,m = pk,m (n) =
n
β(n, i)˜ yk (i)˜ ym (i),
(7)
i=1
with k, m = 1, ..., K − 1, z(n) is a column vector whose mth component is given by zm (n) =
n
β(n, i)eK (i)˜ ym (i), m = 1, ..., K − 1
(8)
for stationary environments, i.e. wo (n) = wo . A random-walk model is assumed for the case of nonstationary scenarios, i.e. wo (n) = wo (n − 1) + q(n),
where q(n) is an i.i.d. vector with autocorrelation matrix E{q(n)qT (n)} = Q. Signal e0 (n) is a zero-mean i.i.d. additive noise uncorrelated with u(n) and with variance σ02 . Using this model, we can define the a priori errors of the contributing filters as ea,k (n) = ek (n) − e0 (n), k = 1, ... , K [1]. In order to find the value of the optimum combiners, λo (n) = [λo,1 (n), ..., λo,K−1 (n)]T , we set the derivates of J(n) = E{e2 (n)} to zero. After some algebraic manipulations we get K−1
λo,k (n)E{[ea,m (n) − ea,K (n)][ea,k (n) − ea,K (n)]} =
k=1
E{[ea,K (n) − ea,m (n)]ea,K (n)}, (11)
for m = 1, ..., K − 1. Here we have assumed that the value of λo,k (n) is independent of the a priori errors of the component filters, which is more appropriate under steady-state conditions (including tracking and stationary environments [1]). Defining Jk,m (n) = E{ea,k (n)ea,m (n)} and ΔJk,m (n) = Jk,m (n) + JK,K (n) − Jm,K (n) − Jk,K (n), the solution for the optimum combiners yields λo (n) = P−1 o (n)zo (n)
[Po ]k,m = ΔJk,m ,
and λ(n) = [λ1 (n), ..., λK−1 (n)]T . Therefore, the solution of the problem is easily obtained from (6) as (9)
In the light of (3), this result can be interpreted intuitively, since combination of filters can be seen as a two-layer adaptive filter [7],[8], with λ(n) a weight vector for the second filtering stage. Under such interpretation, P(n) corresponds to the autocorrelation matrix of the input vector, given by y ˜(n) = [˜ y1 (n), y˜2 (n), ..., y˜K−1 (n)]T , and z(n) is the crosscorrelation between y ˜(n) and signal eK (n) = d(n) − yK (n). Note that solution (9) is coherent with the scheme presented in [7] for the combination of two components. An important difference, however, is that it requires the inversion of matrix P(n) for K > 2. If the number of contributing filters K is large, the inversion in (9) may require considerable computational cost. Nevertheless, the matrix inversion lemma can be used to avoid the inversion both for exponential and rectangular windows β(n, i). Square-root implementations [9, Ch.13] can be used, if necessary, to avoid numerical instability. Furthermore, the cost of matrix inversion is very reduced for small K.
k, m = 1, ..., K − 1 and .
zo = [JK,K − J1,K , ..., JK,K − JK−1,K ].
(13) (14)
Once the optimal combiners have been calculated, we take expectations on both sides of (9). Considering that the expectation of the inversion of matrix P(n) can be approximated by the inversion of the expectation of the matrix [9, pp.318] and assuming independence, we have E{λ(n)} ≈ [E{P(n)}]−1 E{z(n)}.
(15)
Then, the expected value of each element of P(n) yields n E{pk,m (n)} = E β(n, i)˜ yk (i)˜ ym (i) = n
i=1
β(n, i)E{[ea,k (i) − ea,K (i)][ea,m (i) − ea,K (i)]} =
i=1 n
β(n, i)[Jk,m (i) + JK,K (i) − Jn,K (i) − Jm,K (i)];
(16)
i=1
and following the same arguments for each element of z(n): E{zm (n)} =
B. Analysis of the algorithm In this section, some analytical insight is given in order to analyze the proposed LS adaptation scheme, comparing it with the optimal combiners in a mean-square error sense. This analysis holds for white and colored input signals, and for stationary and tracking situations. For this, we assume for all filters a common linear regression model d(n) = woT (n)u(n) + e0 (n) that relates d(n) and the input signal u(n), satisfiying E{u(n)} = 0 and E{u(n)uT (n)} = R. Weight vector wo (n) represents an unknown plant, that is kept fixed
(12)
where, omitting the temporal index n for the sake of simplicity,
i=1
λ(n) = P−1 (n)z(n).
(10)
n
β(n, i)[JK,K (i) − Jm,K (i)].
(17)
i=1
Comparing (16) and (17) with the elements of Po (n) and zo (n) respectively, it is obvious that (15) approximates the value of the optimal combiners given by (12). Of course, under steady-state conditions, the larger the length of the window β(n, i), the more accurate the approximation, reducing the misadjustment. However, for transient situations, shorter windows would be preferred, showing a faster adaptation at the expense of an increased steady-state error.
2977
III. E XPERIMENTS
The influence of the length of β(n, i) is considered selecting two different values, L = 10000 and L = 1000, in the experiments.
μ
1
NSD(∞), dB
3−Comb
μ3
μ2
20
1 λ (∞) 1
λ (∞)
0.5
2
0 −13
−10
10
−7
10
−4
10 Tr(Q)
−1
10
10
Fig. 1. Steady-state tracking performance of a combination of three filters for varying Tr(Q), averaged over 25000 iterations (once the algorithms have converged) and over 50 realizations. Top panel: NSD(∞) of the three NLMS components, of their combination (3-Comb), and of a combination of the filters with µ1 and µ3 (2-Comb); Bottom panel: Mixing parameters. −3
Tr(Q) = 10 EMSE = −15 dB
NSD(n), dB
Mixing Par.
NSD(n), dB
o
(19)
The steady-state NSD is therefore defined as NSD(∞) = EMSE(∞) / EMSEo (∞) where EMSEo is the EMSE of the optimal filter. Fig. 1 shows the steady-state NSD of the three NLMS components as well as of their affine combination for varying Tr(Q). We have selected the length of the rectangular window L = 10000 large enough to obtain an accurate estimation of the combination performance with low variance, even when the EMSE of the components is very low. Regarding the upper panel of Fig. 1, it can be seen that the proposed adaptation scheme achieves an appropriate performance, resulting in a combination that behaves as well as the best of its components, and, for some values of Tr(Q), even better than any of them [1]. In addition, the benefits of combining more than two filters with different step sizes become evident, since each component clearly outperforms the others for a different range of Tr(Q), and a combination of just two components would obtain a worse performance than the combination of the three filters. The bottom panel of Fig. 1 represents the steady-state value of λ1 (∞) and λ2 (∞). Correlations between plant estimates from each component filter make mixing parameters take values outside of the range [0,1], for instance for slow speeds of changes. In these situations, affine combinations should be preferred over convex ones. Another experiment has been designed to show the ability of adaptation of our scheme. For this, we implement a scenario where the
2−Comb
3
0
Mixing Par.
−
μ
2
10
A. Tracking performance In order to study the tracking performance of the algorithm, we assume that the real plant wo (n) varies according to the random-walk model shown in (10), where Tr(Q) can be seen as a measure of the speed of changes of the plant. For these experiments, the figure of merit employed is the normalized square deviation (NSD), defined as the ratio between the excess mean-square-error (EMSE) of a filter, i.e. EMSE(n) = E{[e(n) − e0 (n)]2 }, to that of the NLMS filter with optimal step size μo . Note that for the random-walk model there exists an optimal NLMS filter whose step size is given by [9, pp.391]
Tr(Q) Tr(Q) 1 2 + 4σ μo = o 2σo2 E(1/||u(n)||2 ) E(1/||u(n)||2 )
μ
1
30
Mixing Parameters
In this section, we study the performance of the presented LS adaptation scheme by means of two sets of experiments in a plant identification setup. The first experiment focuses on the tracking capabilities of the method, while the second studies the convergence and stationary behavior of the combined filter. Both experiments share several common settings: We employ an i.i.d. Gaussian noise with variance σu2 = 1 as input signal u(n). The unknown plant wo (n) consists of N = 32 coefficients whose initial values are randomly generated in the range [−1, 1]. Additive i.i.d noise e0 (n) is present at the output giving rise to a signal-to-noise ratio (SNR) of 30 dB. For the sake of simplicity, K = 3 normalized LMS (NLMS) filters with step sizes μ1 = 1, μ2 = 0.1 and μ3 = 0.01 are considered as component filters. Finally, a rectangular-weigthing window with length L is employed, i.e., 1/L , n − L < i ≤ n β(n, i) = (18) 0, i ≤ n − L.
Tr(Q) 1 . 2σo2 E(1/||u(n)||2 )
μ
40
−6
Tr(Q) = 10 EMSE = −37 dB o
μ1
30
μ2
−9
Tr(Q) = 10 EMSE = −52 dB o
μ3
Comb. L
20 10 0 1
λ1(n)
0.5
λ (n) 2
0 μ1
30
μ2
μ3
Comb. L´
20 10 0 1
λ1(n)
0.5 0 1
λ2(n) 30000
60000
90000
n Fig. 2. Convergence of the proposed scheme after changes in Tr(Q). All figures are averages over 1000 realizations. From top to bottom: NSD(n) of the NLMS components and of their combination with L = 10000; evolution of the two mixing parameters using L = 10000; same than first and second panels, respectively, but using L = 1000.
speed of changes of the plant varies suddenly during the experiment. The top panel of Fig. 2 shows that the combination performs as the best component filter, obtaining a similar NSD(n) for the three values of Tr(Q). However, the combination exhibits a delay in its reconvergence when the value of Tr(Q) changes. This effect is due to the chosen length of β(n, i), L = 10000, that allows a very accurate estimation of the steady-state of the optimal combiners, but implies a delay in the transfer between component filters. The use of a smaller window L = 1000 (two bottom panels of Fig. 2) permits a better transient behavior for the combination, however, this incurs
2978
NSD(n), dB
−6
Tr(Q) = 10 EMSEo = −37 dB μ
30
μ
1
−9
Tr(Q) = 10 EMSEo = −52 dB μ
2
3
5
μ
1
μ
2
μ
Comb. L
3
−25
L−L‘ Comb.
−45
20 10 0 1
Mix. Par. EMSE(n), dB Mix. Par. EMSE(n), dB
−3
Tr(Q) = 10 EMSEo = −15 dB
30000
60000
90000
n
λ1(n)
1 0.5 0 5
λ2(n)
μ1
μ2
μ3
Comb. L´
−25
a degraded steady-state performance due to the increased variance of the error (an increment in the NSD of about 1.5 dB can be observed after n > 70000). The best of both affine combinations using L and L can be achieved, if their respective outputs are mixed, employing a basic convex combination (with a negligible computational cost increment w.r.t that of a single affine combination of three filters1 . For a detailed explanation about this combination scheme, please refer to [1], [8]). Fig. 3 represents the improved performance of this "upper-layer" combination. B. Convergence and stationary performance For these simulations, a stationary plant is assumed, i.e. wo (n) = wo , in order to study the convergence and stationary performance of the proposed scheme. An abrupt change in the plant occurs at n = 50000 to show the reconvergence ability of the algorithm. The figure of merit employed to evaluate the performance of the filters (components and combined) is the EMSE. Fig. 4 shows how the combination employing a large window (with length L) achieves an appropriate steady-state performance, for instance during 30000 < n < 50000, obtaining an slightly lower EMSE(n) than that of the slowest filter (with μ3 = 0.01), because of the use of an affine scheme. Again, there exists a trade-off in the selection of the length of the window (see the results in Fig. 4 for both L and L , where it can be observed an increment of about 2 dB in the steady-state error of the combination when the smallest window L’ is selected). This compromise can be alleviated by adaptively mixing the outputs of both affine combinations (bottom panel of Fig. 4). IV. C ONCLUSION Adaptive combinations of adaptive filters constitute an attractive and versatile approach to enhance the performance of adaptive filters. In this paper, we have presented a new scheme for the affine combination of an arbitrary number of filters based on the solution of an LS problem, including an analysis of this novel approach. Several experiments demonstrate the advantages of combining more than two filters, mainly in tracking environments, showing a superior behavior with respect to combinations of just two components. In addition, satisfactory results were also achieved in stationary scenarios. ACKNOWLEDGMENT This work was partly supported by the Spanish Ministry of Education and Science under grants CICYT TEC-2005-00992 and 1 Note that there is no increment in the memory requirements by the fact of considering two different affine combinations with different window lengths, since both rectangular windows can be implemented using a common circular array and two different pointers.
−45
EMSE(n), dB
Fig. 3. Time evolution of the tracking performance of the convex combination (named L-L’ Comb.) of two affine combinations with L and L when Tr(Q) is varied, averaged over 1000 realizations.
λ1(n)
1 0.5 0 5
λ2(n)
μ
1
μ
2
μ
L−L‘ Comb.
3
−25 −45 1
50000
100000
n Fig. 4. Transient behavior of a combination of three filters in a stationary scenario (averaged over 1000 realizations). From top to bottom: EMSE(n) of the NLMS components and of their combination with L = 10000; evolution of the two mixing parameters using L = 10000; same as first and second panels using L = 1000. The bottom panel represents the EMSE of the convex combination of the schemes with L and L’ (named L-L’ Comb).
CAM S-505/TIC/0223, and by the Deutsche Forschungsgemeinschaft (DFG) under contract numbers KE 890/5-1 and KE 890/6-1. R EFERENCES [1] J. Arenas-García, A. R. Figueiras-Vidal, and A. H. Sayed, “Mean-square performance of a convex combination of two adaptive filters,” IEEE Trans. Signal Process., vol. 54, pp. 1078–1090, Mar. 2006. [2] N. J. Bershad, J. C. M. Bermudez, and J.-Y. Tourneret “An affine combination of two LMS adaptive filters-transient mean-square analysis,” IEEE. Trans. Signal Process., vol. 56, pp. 1853–1864, May 2008. [3] R. Candido, M. T. M. Silva, and V. H. Nascimento, “Affine combinations of adaptive filters,” in Conf. Rec. of the 42nd Asilomar Conf. on Sign., Syst. & Comp., Pacific Grove, CA, Oct. 2008, pp. 236–240. [4] J. Arenas-García and A. R. Figueiras-Vidal, “Adaptive Combination of Proportionate Filters for Sparse Echo Cancellation,” IEEE Trans. Audio, Speech and Language Process., vol. 17, pp. 1087–1098, Aug. 2009. [5] C. G. Lopes and A. H. Sayed, “Diffusion least-mean-squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. Signal Process., vol. 56, pp. 3122–3136, July 2008. [6] A.T. Erdogan, S.S. Kozat, and A.C. Singer, “Comparison of convex combination and affine combination of adaptive filters,” in Proc. ICASSP, Taipei, Apr. 2009, pp. 3089–3092. [7] L. A. Azpicueta-Ruiz, A. R. Figueiras-Vidal, and J. Arenas-García, “A new least squares adaptation scheme for the affine combination of two adaptive filters,” in Proc. MLSP., Cancun, MX, Oct. 2008, pp. 327–332. [8] L. A. Azpicueta-Ruiz, A. R. Figueiras-Vidal and J. Arenas-García, “A normalized adaptation scheme for the convex combination of two adaptive filters,” in Proc. ICASSP, Las Vegas, NV, Apr. 2008, pp. 3301–3304. [9] A.H. Sayed, Fundamentals of Adaptive Filtering, New York: Wiley, 2003. [10] J. Arenas-García, V. Gómez-Verdejo and A. R. Figueiras-Vidal, “New Algorithms for improved adaptive convex combination of LMS transversal filters,” IEEE Trans. Inst. Meas., vol. 54, pp. 2239–2249, Dic. 2005. [11] S.S. Kozat and A.C. Singer, “Multi-stage adaptive signal processing algorithms,” in Proc. 2000 IEEE Sensor Array Multichannel Signal Workshop, Cambridge, MA, Mar. 2000, pp. 380–384.
2979