CONVERGENCE ANALYSIS OF THE AUGMENTED COMPLEX KLMS ALGORITHM WITH PRE-TUNED DICTIONARY Wei Gao †‡
Jie Chen ?
C´edric Richard †
Jose-Carlos M. Bermudez ∗
Jianguo Huang ‡
†
gao
[email protected] Universit´e de Nice Sophia-Antipolis, France ? University of Michigan, Ann Arbor, USA ∗ Federal University of Santa Catarina, Florian´opolis, Brazil ‡ Northwestern Polytechnical University, Xi’an, China
[email protected] [email protected] [email protected] ABSTRACT Complex kernel-based adaptive algorithms have been recently introduced for complex-valued nonlinear system identification. These algorithms are built upon the same framework as complex linear adaptive filtering techniques and Wirtinger’s calculus in complex reproducing kernel Hilbert spaces. In this paper, we study the convergence behavior of the augmented complex Gaussian KLMS algorithm. Simulation results illustrate the accuracy of the analysis. Index Terms— Kernel adaptive filtering, complex RKHS, complex Gaussian kernel, non-circular data 1. INTRODUCTION Single-kernel adaptive filters have been extensively studied over the last decade, and their performance have been investigated experimentally and theoretically on a variety of real-valued nonlinear system identification problems. Typical filtering algorithms in reproducing kernel Hilbert spaces (RKHS) are the KRLS algorithm [1], the sliding-window KRLS algorithm [2], and the quantized KRLS algorithm [3]. The KNLMS algorithm was independently introduced in [4–7]. The KLMS algorithm, proposed in [8, 9], has attracted much attention in recent years because of its simplicity and robustness. An analysis of its convergence behavior with Gaussian kernel is reported in [10], and a closed-form condition for convergence is introduced in [11]. The stability of this algorithm with `1 -norm regularization is studied in [12, 13]. Kernel-based adaptive filtering algorithms for complex data have recently attracted attention since they ensure phase processing. This is of importance for applications in communication, radar and sonar. A complexified kernel LMS algorithm and pure complex kernel LMS algorithm are introduced in [14]. A direct extension of the derivations in [10] is proposed in [15] to analyze the convergence behavior of complex KLMS algorithm (CKLMS). The augmented CKLMS algorithm (ACKLMS) is presented in [16, 17], and its normalized counterpart is described in [18, 19]. These works show that augmented complex-valued algorithms provide significantly improved performance compared with complex-valued algorithms. Finally, the quaternion KLMS algorithm has been recently introduced in [20] as an extension of complex-valued KLMS algorithms. The aim of this paper is to analyze the convergence behavior of the ACKLMS algorithm. First, we introduce some definitions and a general framework for pure complex multikernel adaptive filtering algorithms. This framework relies on multikernel adaptive filThis work was partially supported by the National Natural Science Foundation of China (61271415, 61401499).
[email protected] ters that has previously been derived for use with real-valued data in [21–24]. Then, we derive models for the convergence behavior in the mean and mean-square sense of the ACKLMS algorithm with Gaussian kernels. Finally, the accuracy of these models is checked with simulation results. 2. COMPLEX MULTI-KERNEL LMS 2.1. Preliminaries Consider the complex input/output sequence {(u(n), d(n))}N n=1 with u(n) ∈ U and d(n) ∈ C, where U is a compact of CL . The complex input vector can pbe expressed in the form u(n) = 1 − ρ2 ure (n) + iρ uim (n) (1) = uI (n) + i uQ (n) where the subscripts I√and Q denote “in-phase” and “quadrature” components, and i = −1. The sequence ure (n) (resp., uim (n)) is supposed to be zero-mean, independent, and identically distributed according to a real-valued Gaussian distribution. The entries of each input vector ure (n) (resp., uim (n)) can, however, be correlated. In addition, the sequences ure (n) and uim (n) are assumed to be independent. This implies that E{u(n − i)uH (n − j)} = 0 for i 6= j, where the operator (·)H denotes Hermitian transpose. The circular√ ity of input data is controlled by parameter ρ. Setting ρ = 2/2 results in a circular input, while ρ approaching to 0 or 1 leads to a highly non-circular input. Let κC : U × U → C be a complex reproducing kernel. We denote by (H, h·, ·iH ) the induced complex RKHS with its inner product. Complex reproducing kernels include the Szego kernel, the Bergman kernel, and the so-called pure complex Gaussian kernel. The latter is the extension of the Gaussian kernel for complex arguments. The pure complex Gaussian kernel is defined as follows [25] ! L X ∗ 2 2 κC (u, v) = exp − (u` − v` ) /2ξ (2) `=1
with u` and v` the `-th entries of u, v ∈ CL . The parameter ξ > 0 denotes the kernel bandwidth and (·)∗ denotes the conjugate operator. The conjugate of kernel κC (u, v) is defined by ! L X ? ∗ 2 2 κC (u, v) = exp − (v` − u` ) /2ξ . (3) `=1 ?
Note that (·) is defined on kernels and should not be confounded with the complex conjugate (·)∗ . We shall focus on the above complex Gaussian kernel in the sequel.
2.2. A framework for complex multi-kernel algorithms Let {κC,k }K k=1 be the family of candidate complex kernels, and Hk the RKHS defined by each κC,k . Consider the space H of multidimensional mappings Φ:
with η a positive step-size, κH (n) = col{κC,k (n)}K k=1 the complex kernelized input vector, and e(n) = d(n) − αH (n) κH (n) the estimation error. Finally, the optimal function is of the form ψ(·) =
C → CK u 7→ Φ(u) = col{ϕ1 (u), . . . , ϕK (u)}
N X K X
∗ αn,k κC,k (·, u(n)).
(13)
n=1 k=1
(4)
2.3. Augmented complex kernel LMS (ACKLMS) with ϕk ∈ Hk and col{·} the operator that stacks its arguments on top of each other. Let h·,·iH be the inner product in H defined as P hϕk , ϕ0k iHk . (5) hΦ, Φ0 iH = K k=1 The space H equipped with the inner product h·,·iH is a Hilbert space as (Hk , h·,·iHk ) is a complex Hilbert space for all k. We can then define the vector-valued representer of evaluation κH (·, u) such that Φ(u) = [Φ, κH (·, u)]
(6)
with κH (·, u) = col{κC,1 (·, u), . . . , κC,K (·, u)} and [·,·] the entrywise inner product. This yields the following reproducing property κH (u, v) = [κH (·, u), κH (·, v)].
(7)
Let Ψ = col{ψ1 , . . . , ψK } be a vector-valued function in space H, P and let ψ = K k=1 ψk with ψk ∈ Hk be the scalar-valued function that sums the entries of Ψ, namely, ψ = 1> K Ψ with 1K the all-one column vector of length K. Given a valued input-output sequence {(d(n), u(n))}N n=1 , we aim at estimating a multidimensional function Ψ in H that minimizes the regularized least-square error min J(Ψ) = Ψ∈H
N X 2 2 d(n) − 1> + λk1> K Ψ(u(n)) K ΨkH
(8)
n=1
with λ ≥ 0 a regularization constant. By virtue of the generalized multidimensional representer theorem, not presented in this paper due to lack of space, the optimum function Ψ can be written as Ψ(·) = col
N nX
oK ∗ αn,k κC,k (·, u(n)) .
(9)
k=1
n=1
For simplicity, without loss of generality, we shall omit the regularization term in problem (8), which can be reformulated as min J(α) = α
N K X X 2 d(n) − αH k κC,k (n) n=1
(10)
k=1 >
where α = col{α1 , . . . , αK } with αk = (α1,k , . . . , αN,k ) is the unknown weight vector, and κC,k (n) is the N × 1 kernelized input vector with j-th entry κC,k (u(j), u(n)). Calculating the directional derivative of J(α) with respect to α by Wirtinger’s calculus yields ∂αk J(α) = −2
N X
e∗ (n) κC,k (·, u(n)).
(11)
n=1
P H where e(n) = d(n) − K k=1 αk κC,k (n). Approximating (11) by its instantaneous estimate ∂αk J(α) ≈ −2 e∗ (n) κC,k (·, u(n)), we obtain the stochastic gradient descent algorithm: α(n + 1) = α(n) + η e∗ (n) κH (n) =
n X i=1
η e∗ (i) κH (i)
(12)
In order to overcome the problem of the increasing amount n of observations in an online context, a fixed-size model is usually adopted: ψ(·) =
M X K X
∗ αm,k κC,k (·, u(ωm ))
(14)
m=1 k=1
where ω , {κH (·, u(ωm ))}M m=1 is the so-called dictionary of the filter ψ, and M its length. Limiting the number of single-kernel filters to K = 2, and setting the two kernels to (2)-(3), the ACKLMS algorithm based on model (14) is given by (See [18] for an introduction to ACKLMS): ˆ d(n) =
M X
∗ α1,m (n) κC (u(n), u(ωm ))
m=1
∗ + α2,m (n) κ?C (u(n), u(ωm ))
(15)
= αH (n) κH,ω (n). The ACKLMS algorithm can be viewed as a complex Gaussian bikernel case of the complex multi-kernel algorithm [18, 19]. It can be expected that ACKLMS algorithm outperforms the existing CKLMS algorithms due to the flexibility of complex multi-kernels. 3. ACKLMS PERFORMANCE ANALYSIS We shall now study the transient and steady-state of the meansquare error conditionally to dictionary ω of the complex Gaussian bi-kernel LMS algorithm, that is, Z |e(n)|2 dρ(u(n), d(n) | ω), (16) E |e(n)|2 | ω = U×C
ˆ with e(n) = d(n) − d(n) and ρ a Borel probability measure. We shall use the subscript ω for quantities conditioned on dictionary ω. Given ω, the estimation error at time instant n is given by eω (n) = d(n) − dˆω (n)
(17)
ˆ Multiplying eω (n) by its conjugate and with dˆω (n) = d(n)|ω. taking the expected value yields the mean-square-error (MSE) JMSE,ω = E{|d(n)|2 }
(18) H − 2 Re pH κd,ω αω (n) + αω (n)Rκ,ω αω (n) with Rκ,ω = E κH,ω (n)κH H,ω (n)|ω the correlation matrix of input data, and pκd,ω = E {κH,ω (n)d∗ (n)|ω} the cross-correlation vector between κH,ω (n) and d(n). As Rκ,ω is positive definite, the optimum weight vector is given by αopt,ω = arg min JMSE,ω (αω ) = R−1 κ,ω pκd,ω
(19)
and the minimum MSE is −1 Jmin,ω = E |d(n)|2 − pH κd,ω Rκ,ω pκd,ω .
(20)
αω
3.1. Mean weight error analysis The weight update of the ACKLMS algorithm is given by αω (n + 1) = αω (n) + η e∗ω (n) κH,ω (n).
(21)
Theorem 3.1 (Stability in the mean) Assume CMIA introduced in [26] holds. Then, for any initial condition, given a dictionary ω, the Gaussian ACKLMS algorithm (21) asymptotically converges in mean if the step size is chosen to satisfy 0 < η < 2/λmax (Rκ,ω )
Let v ω (n) be the weight error vector defined as v ω (n) = αω (n) − αopt,ω .
(22)
(31)
where λmax (·) denotes the maximum eigenvalue of its matrix argument. The entries of Rκ,ω are given by (28).
The weight error vector update equation is then given by v ω (n + 1) = v ω (n) + η e∗ω (n) κH,ω (n).
(23)
Using (24) and CMIA, MSE is related to the second-order moment of the weight vector by [10]
The error (17) is consequently rewritten as H eω (n) = d(n) − κH H,ω (n)v ω (n) − κH,ω (n)αopt,ω .
(24)
Substituting (24) into (23) yields v ω (n + 1) = v ω (n) + η d∗ (n)κH,ω (n) −
κH H,ω (n)v ω (n)κH,ω (n)
−
(25)
κH H,ω (n)αopt,ω κH,ω (n) .
Taking expected value of (25), using the CMIA hypothesis introduced in [26], and (19), we get the mean weight error model: E {v ω (n + 1)} = (I − η Rκ,ω )E {v ω (n)} .
3.2. Mean-square error analysis
JMSE,ω (n) = Jmin,ω + trace {Rκ,ω C v,ω (n)} (32) H where C v,ω (n) = E v ω (n)v ω (n) is the autocorrelation matrix of the weight error vector v ω (n), and Jmin,ω is the minimum MSE given by (20). The analysis of the MSE behavior (32) requires a recursive model for C v,ω (n). Post-multiplying (25) by its Hermitian conjugate, taking the expected value, and using CMIA, we get the following recursion for sufficiently small step sizes C v,ω (n + 1) ≈ C v,ω (n) − η [Rκ,ω C v,ω (n) + C v,ω (n)Rκ,ω ]
(26)
The (i, j)-th entry of matrix Rκ,ω is given by [Rκ,ω ]i,j = E {κH (u(n), u(ωi )) [κH (u(n), u(ωj ))]∗ }
(33)
+ η 2 T ω (n) + η 2 Rκ,ω Jmin,ω (27)
with the complex Gaussian bi-kernel κH (u(n), u(ωm )) given by ( κC (u(n), u(ωm )), 1 ≤ m ≤ M κH (u(n), u(ωm )) = κ?C (u(n), u(ωm )), M + 1 ≤ m ≤ 2M Let us define a new vector that separates the real and imaginary ˜ (n) = col{uI (n), uQ (n)} ∈ IR2L . With parts of u(n) such that u the Gaussian kernels (2)-(3), the expected value of (27) can be obtained by making use of the moment generating function in [26]. We get (28) where δm is the indicator function 1, 1 ≤ m ≤ M δm = (29) −1, M + 1 ≤ m ≤ 2M
with T ω (n) n o (34) H H = E κH,ω (n)κH H,ω (n)v ω (n)v ω (n)κH,ω (n)κH,ω (n) . Evaluating (34) is a significant step in the analysis since κH,ω (n) is a nonlinear transformation of a quadratic form of u(n). Using CMIA to determine the (i, j)-th element of T ω (n) in (34) yields M X M X E{κH (u(n), u(ωi )) [κH (u(n), u(ωj ))]∗ [T ω (n)]i,j ≈ `=1 p=1
× κH (u(n), u(ω` )) [κH (u(n), u(ωp ))]∗ } · [C v,ω (n)]`,p . (35) This expression can be written as [T ω (n)]i,j ≈ trace {K ω (i, j) C v,ω (n)}
˜ > (n)}. The definition of H(i, j) in (28) deand Ru˜ = E{˜ u(n) u pends on i and j as follows: I O H(i, j) = , 1 ≤ i, j ≤ M and M + 1 ≤ i, j ≤ 2M O −I I 1i I H(i, j) = , 1 ≤ i ≤ M and M + 1 ≤ j ≤ 2M 1i I −I I −1i I H(i, j) = , 1 ≤ j ≤ M and M + 1 ≤ i ≤ 2M −1i I −I Vector b in (28) is given by X − uI (ωs ) + 1i[δi uQ (ωi ) − δj uQ (ωj )] s={i,j} b= . (30) X − uQ (ωs ) + 1i[−δi uI (ωi ) + δj uI (ωj )] s={i,j}
Equation (26) leads to the following theorem (without proof due to lack of space):
(36)
where the (`, p)-th entry of the matrix K ω (i, j) is given by [K ω (i, j)]`,p = E{κH (u(n), u(ωi )) [κH (u(n), u(ωj ))]∗ × κH (u(n), u(ω` )) [κH (u(n), u(ωp ))]∗ } . (37) ˜ (n) and use the Similarly, we also rewrite (37) in terms of vector u moment generating function [26]. This leads to (38)-(39). The definition of L(i, j) in (38) depends on i and j as follows: 1 ≤ i, j, `, p ≤ M 1 ≤ i, j ≤ M ; M + 1 ≤ `, p ≤ 2M 1 ≤ `, p ≤ M ; M + 1 ≤ i, j ≤ 2M 2I O L(i, j) = O −2I 1 ≤ i, ` ≤ M ; M + 1 ≤ j, p ≤ 2M 1 ≤ j, p ≤ M ; M + 1 ≤ i, ` ≤ 2M M + 1 ≤ i, j, `, p ≤ 2M 1 ≤ j ≤ M ; M + 1 ≤ i, `, p ≤ 2M 1 ≤ ` ≤ M ; M + 1 ≤ i, j, p ≤ 2M 2I 1i I L(i, j) = 1i I −2I 1 ≤ j, `, p ≤ M ; M + 1 ≤ i ≤ 2M 1 ≤ i, j, ` ≤ M ; M + 1 ≤ p ≤ 2M
− 1 P 1 P 2 [Rκ,ω ]i,j = I + 2 H(i, j)Ru˜ 2 · exp − 2 [ s={i,j} kuI (ωs )k2 − s={i,j} kuQ (ωs )k2 ] ξ 2ξ 1i 1 > 2 > > −1 × exp [δ u (ω )u (ω ) − δ u (ω )u (ω )] · exp b R (I + H(i, j)R ) b i i Q i j j Q j u ˜ u ˜ I I ξ2 2ξ 4 ξ2
(28)
− 1 2 1i > > > [K ω (i, j)]`,p = I + 2 L(i, j)Ru˜ 2 exp 2 [δi u> (ω )u (ω )−δ u (ω )u (ω )+δ u (ω )u (ω )−δ u (ω )u (ω )] i Q i j j Q j Q p p Q p ` ` ` I I I I ξ ξ P 2 1 P 1 > −1 2 2 f Ru˜ (I + 2 L(i, j)Ru˜ ) f × exp − 2 ( s={i,j,`,p} kuI (ωs )k − s={i,j,`,p} kuQ (ωs )k ) · exp 2ξ 2ξ 4 ξ
f=
2I −1i I −1i I −2I
L(i, j) =
P
+ 1i [δi uQ (ωi ) − δj uQ (ωj ) + δ` uQ (ω` ) − δp uQ (ωp )]
−
P
+ 1i [−δi uI (ωi ) + δj uI (ωj ) − δ` uI (ω` ) + δp uI (ωp )]
s={i,j,`,p} uQ (ωs )
1 ≤ i ≤ M ; M + 1 ≤ j, `, p ≤ 2M 1 ≤ p ≤ M ; M + 1 ≤ i, j, ` ≤ 2M
1 ≤ i, `, p ≤ M ; M + 1 ≤ j ≤ 2M 1 ≤ i, j, p ≤ M ; M + 1 ≤ ` ≤ 2M
L(i, j) = L(i, j) =
2I 2i I 2i I −2I
!
−
s={i,j,`,p} uI (ωs )
1 ≤ j, ` ≤ M ; M + 1 ≤ i, p ≤ 2M 2I −2i I 1 ≤ i, p ≤ M ; M + 1 ≤ j, ` ≤ 2M −2i I −2I
3.3. Steady-State behavior In order to determine the steady-state of recursion (33), we rewrite it in a lexicographic form. Let vec {·} denote the operator that stacks the columns of a matrix on top of each other. Vectorizing C v,ω (n) and Rκ,ω by cv,ω (n) = vec {C v,ω (n)} and r κ,ω = vec {Rκ,ω }, we can rewrite (33) as follows cv,ω (n) = Gω cv,ω (n) + η 2 Jmin,ω r κ,ω
(41)
with Gω = I − η(Gω,1 + Gω,2 ) + η 2 Gω,3 . Matrix Gω is found by the use of the following definitions: • I is the identity matrix of dimension 4M 2 × 4M 2 ;
(38)
(39)
Gaussian distributions with standard deviation σw = 1. Both parameters ρ0 and σu were set to 0.5. The system to be identified was ( y(n) = (0.5 − 0.1i) u(n) − (0.3 − 0.2i) u(n − 1) d(n) = y(n) + (1.25 − 1i) y 2 (n) + (0.35 − 0.2i) y 3 (n) + z(n) where z(n) is a complex additive zero-mean Gaussian noise with standard deviation σz = 0.1. At each time n, ACKLMS algorithm was updated with input vector u(n) = [u(n), u(n − 1)]> and the reference signal d(n). The correlation matrix Ru˜ is thus given by (1 − ρ2 ) (1 − ρ2 )ρ0 0 0 (1 − ρ2 )ρ0 (1 − ρ2 ) 0 0 . (45) Ru˜ = σu2 2 2 0 0 ρ ρ ρ0 2 2 0 0 ρ ρ0 ρ The pure complex Gaussian bandwidth ξ and the step-size η were set to 0.55 and 0.1, respectively. We used the coherence sparsification criterion proposed in [5] with threshold µ0 = 0.3 to construct a fixed dictionary of length M = 12. All simulation curves were obtained by averaging over 200 Monte Carlo runs. It is shown in Figure 1 that the theoretical curves consistently agree with the Monte Carlo simulations in both transient and steady-state.
• Gω,1 = I ⊗ Rκ,ω , where ⊗ denotes the Kronecker product; −10
• Gω,2 = Rκ,ω ⊗ I;
−10
Monte Carlo MSE Theoretical MSE Minimum MSE Steady-State MSE
−11
cv,ω (∞) = η 2 Jmin,ω (I − Gω )−1 r κ,ω .
(42)
−16
−13 −14
−18
Theoretical MSE Excess MSE Minimum MSE Steady-State MSE Steady-State EMSE
−20 −22 −24
−15
−26 −16 −28
From equation (32), the steady-state MSE is finally given by JMSE,ω (∞) = Jmin,ω + trace {Rκ,ω C v,ω (∞)}
−14
MSE (dB)
Assuming convergence, the closed-formed solution of the recursion (41) in steady-state is given by
−12
−12
MSE (dB)
• Gω,3 is given by [Gω,3 ]i+2(j−1)M,`+2(p−1)M = [K ω (i, j)]`,p with 1 ≤ i, j, `, p ≤ 2M .
−17 0
(43)
0.5
1
1.5
Iteration n
2
−30 0
0.5
(a) Theory vs. Monte Carlo.
1
1.5
Iteration n
4
x 10
2 4
x 10
(b) Steady-state results.
where the second term on the right side is the steady-state EMSE. Fig. 1. Simulation results of ACKLMS algorithm. 4. EXPERIMENT This section provides an example of nonlinear system identification to check the accuracy of the convergence models. We considered the complex valued input sequence q (44) u(n) = ρ0 u(n − 1) + σu 1 − ρ20 w(n) p with w(n) = 1 − ρ2 wre (n) + i ρ wim (n). Parameter ρ was set to 0.1 corresponding to highly non-circular, and the random variables wre (n) and wim (n) were distributed according zero-mean i.i.d.
5. CONCLUSION In this paper, we presented the ACKLMS algorithm based on the framework of complex multi-kernel. Then we derived a theoretical model of convergence for ACKLMS with pre-tuned dictionary. In future works, we will study how using this model to design dictionaries, and set the step-size and the kernel bandwidth, that allow to reach specified MSE or convergence speed.
6. REFERENCES [1] Y. Engel, S. Mannor, and R. Meir, “Kernel recursive least squares,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275–2285, 2004. [2] S. Van Vaerenbergh, J. Via, and I. Santamaria, “A slidingwindow kernel RLS algorithm and its application to nonlinear channel identification,” in Proc. IEEE ICASSP, Toulouse, France, 2006, pp. 789–792. [3] B. Chen, S. Zhao, P. Zhu, and J. C. Principe, “Quantized kernel recursive least squares algorithm,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 9, pp. 1484–1491, 2013. [4] P. Honeine, C. Richard, and J.-C. M. Bermudez, “On-line nonlinear sparse approximation of functions,” in Proc. IEEE ISIT’07, Nice, France, 2007, pp. 956–960. [5] C. Richard, J. C. M. Bermudez, and P. Honeine, “Online prediction of time series data with kernels,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 1058–1067, 2009. [6] S. Slavakis and S. Theodoridis, “Sliding window generalized kernel affine projection algorithm using projection mappings,” EURASIP J. Adv. Signal Process., vol. 2008:735351, 2008. [7] W. Liu, P. P. Pokharel, and J. C. Principe, “The kernel leastmean-square algorithm,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 543–554, 2008. [8] C. Richard, “Filtrage adaptatif non-lin´eaire par m´ethodes de gradient stochastique court-terme a` noyau,” in Proc. GRETSI, Louvain-la-Neuve, Belgium, 2005, pp. 1–4. [9] W. Liu and J. C. Principe, “Kernel affine projection algorithms,” EURASIP J. Adv. Signal Process., vol. 2008:784292, 2008. [10] W. D. Parreira, J.-C. M. Bermudez, C. Richard, and J.-Y. Tourneret, “Stochastic behavior analysis of the Gaussian kernel-least-mean-square algorithm,” IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2208–2222, 2012. [11] C. Richard and J.-C. M. Bermudez, “Closed-form conditions for convergence of the gaussian kernel-least-mean-square algorithm,” in Proc. Asilomar, Pacific Grove, CA, USA, 2012. [12] W. Gao, J. Chen, C. Richard, J. Huang, and R. Flamary, “Kernel LMS algorithm with forward-backward splitting for dictionary learning,” in Proc. IEEE ICASSP, Vancouver, Canada, 2013, pp. 5735–5739. [13] W. Gao, J. Chen, C. Richard, and J. Huang, “Online dictionary learning for kernel LMS,” IEEE Trans. Signal Process., vol. 62, no. 11, pp. 2765–2777, 2014.
[14] P. Bouboulis and S. Theodoridis, “Extension of Wirtinger’s calculus in reproducing kernel Hilbert spaces and the complex kernel LMS,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 964–978, 2011. [15] T. K. Paul and T. Ogunfunmi, “Study of the convergence behavior of the complex kernel least mean square algorithm,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 9, pp. 1349–1363, 2013. [16] D. P. Mandic, S. Javidi, S. L. Goh, A. Kuh, and K. Aihara, “Complex-valued prediction of wind profile using augmented complex statistics,” Renewable Energy, vol. 34, no. 1, pp. 196– 201, 2009. [17] S. Y. Kung, “Kernel approaches to unsupervised and supervised machine learning,” in Advances in Multimedia Information Processing-PCM 2009, pp. 1–32. Springer, 2009. [18] P. Bouboulis, S. Theodoridis, and M. Mavroforakis, “The augmented complex kernel LMS,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4962–4967, 2012. [19] F. A. Tobar, A. Kuh, and D. P. Mandic, “A novel augmented complex valued kernel LMS,” in Proc. IEEE SAM, 2012, pp. 473–476. [20] F. A. Tobar and D. P. Mandic, “The quaternion kernel least squares,” in Proc. IEEE ICASSP, 2013, pp. 6128–6132. [21] M. Yukawa, “Multikernel adaptive filtering,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4672–4682, 2012. [22] F. A. Tobar, S.-Y Kung, and D. P. Mandic, “Multikernel least mean square algorithm,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 265–277, 2014. [23] W. Gao, C. Richard, J.-C.M. Bermudez, and J. Huang, “Convex combinations of kernel adaptive filters,” in Proc. IEEE MLSP, Reims, France, 2014, pp. 1–5. [24] R. Pokharel, S. Seth, and J. C. Principe, “Mixture kernel least mean square,” in Proc. IJCNN, Dallas, USA, 2013, pp. 1–7. [25] I. Steinwart, D. Hush, and C. Scovel, “An explicit description of the reproducing kernel hilbert spaces of gaussian RBF kernels,” IEEE Trans. Inf. Theory, vol. 52, no. 10, pp. 4635–4643, 2006. [26] J. Chen, W. Gao, C. Richard, and J.-C. M. Bermudez, “Convergence analysis of kernel LMS algorithm with pre-tuned dictionary,” in Proc. IEEE ICASSP, Florence, Italy, 2014, pp. 7243– 7247.