IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
[13] S. Ramani, D. Van De Ville, T. Blu, and M. Unser, “Nonideal sampling and regularization theory,” IEEE Trans. Signal Process., vol. 56, no. 3, pp. 1055–1070, Mar. 2008. [14] M. E. Goss, “An adjustable gradient filter for volume visualization image enhancement,” in Proc. Graphics Interface, 1994, pp. 67–74. [15] T. Theußl, H. Helwig, and E. Gröller, “Mastering windows: Improving reconstruction,” in Proc. IEEE Symp. Volume Visualization (VVS), 2000, pp. 101–108. [16] T. Möller, R. Machiraju, K. Mueller, and R. Yagel, “Evaluation and design of filters using a Taylor series expansion,” IEEE Trans. Vis. Comput. Graphics, vol. 3, no. 2, pp. 184–199, Apr.–Jun. 1997. [17] G. Strang and G. Fix, “A Fourier analysis of the finite element variational method,” in Constructive Aspects of Functional Analysis. Rome, Italy: Cremonese, 1971, pp. 796–830. [18] T. Blu and M. Unser, “Quantitative Fourier analysis of approximation techniques: Part I-interpolators and projectors,” IEEE Trans. Signal Process., vol. 47, no. 10, pp. 2783–2806, Oct. 1999. [19] M. Jacob, T. Blu, and M. Unser, “Sampling of periodic signals: A quantitative error analysis,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1153–1159, May 2002. [20] L. Condat and D. Van De Ville, “Quasi-interpolating spline models for hexagonally-sampled data,” IEEE Trans. Image Process., vol. 16, no. 5, pp. 1195–1206, May 2007. [21] M. Unser and T. Blu, “Cardinal exponential splines: Part I—Theory and filtering algorithms,” IEEE Trans. Image Process., vol. 53, no. 4, pp. 1425–1438, Apr. 2005. [22] S. Mallat, A Wavelet Tour of Signal Processing. New York: Academic, 1999. [23] M. Unser and T. Blu, “Wavelet theory demystified,” IEEE Trans. Signal Process., vol. 51, no. 2, pp. 470–483, Feb. 2003. [24] U. R. Alim, T. Möller, and L. Condat, “Gradient estimation revitalized,” IEEE Trans. Vis. Comput. Graphics, vol. 16, no. 6, pp. 1495–1504, Nov./Dec. 2010. [25] L. Condat and T. Möller, “Quantitative error analysis for the reconstruction of derivatives,” GREYC Lab., Caen, France, Res. Rep. hal00462203, 2009. [26] D. den Hertog, R. Brekelmans, L. Driessen, and H. Hamers, “Gradient estimation schemes for noisy functions,” Tilburg Univ., Center for Economic Research, Tilburg, The Netherlands, Res. Rep. 2003-12, 2003. [27] Y. C. Eldar and M. Unser, “Nonideal sampling and interpolation from noisy observations in shift-invariant spaces,” IEEE Trans. Image Process., vol. 54, no. 7, pp. 2636–2651, Jul. 2006.
2969
Blind Separation of Noncircular Correlated Sources Using Gaussian Entropy Rate Xi-Lin Li and Tülay Adalı, Fellow, IEEE
Abstract—We introduce a new blind source separation (BSS) algorithm for correlated noncircular sources that uses only second-order statistics and fully takes the correlation structure into account. We propose a parametric entropy rate estimator that uses a widely linear autoregressive (AR) model for the sources, and derive the BSS algorithm by minimizing the mutual information of separated time series. We compare the performance of the new algorithm with competing algorithms and demonstrate its superior separation performance as well as its effectiveness in separation of non-Gaussian sources when the identification conditions are met. Index Terms—Blind source separation, complex Gaussian sources, independent component analysis, noncircular sources, second-order statistics.
I. INTRODUCTION In the standard blind source separation (BSS) and independent component analysis (ICA) model, the observed mixture x(t) is assumed to be generated as x(t) = Ms(t), where t is the discrete time index, M is an invertible N 2 N mixing matrix, and s(t) = [s1 (t); . . . ; sN (t)] is the vector of N zero-mean, independent sources [1]. The aim of BSS is to find a demixing, or separation, matrix W such that y(t) = Wx(t) is an estimate of s(t) up to an arbitrary amplitude and order ambiguities. This paper considers the separation of complex Gaussian sources, which can be noncircular. Thus, only the second-order statistics are used for separation. Second order blind identification (SOBI) and its improved version, weights-adjusted SOBI (WASOBI), are the two standard second-order statistics based blind separation algorithms [2], [3]. In SOBI, the mixing matrix M is identified by approximately jointly diagonalizing several time-delayed covariance matrices C( ) = E [x(t)xH (t 0 )], where superscript H denotes Hermitian transpose, and is the time delay. Note that for noncircular sources, the time-delayed pseudo covariance matrix P( ) = E [x(t)xT (t 0 )] can be used for identifying M as well [4], where superscript T denotes transpose. Especially, the strongly uncorrelating transform (SUT) and the generalized uncorrelating transformation (GUT) algorithms identify M by jointly diagonalizing C(0) and P(0) [5]–[7]. Another way to achieve the blind separation of stationary Gaussian sources is to model the sources as autoregressive (AR) processes and to estimate the separation matrix along with the AR parameters by maximizing the likelihood cost function [9], [10]. Except SUT, the above mentioned blind separation algorithms either explicitly or implicitly assume circular sources, or only consider the separation of real-valued sources. Thus, they are suboptimal in the presence of noncircular sources. Also important to note is that SUT cannot exploit the temporal correlation structure of sources and thus can achieve separation of noncircular sources as long as they have Manuscript received February 24, 2010; revised June 10, 2010 and October 22, 2010; accepted February 04, 2011. Date of publication February 14, 2011; date of current version May 18, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jean-Christophe Pesquet. This work was supported by the NSF grants NSF-CCF 0635129 and NSF-IIS 0612076. The authors are with the Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore, MD 21250 USA (e-mail:
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2011.2114653
1053-587X/$26.00 © 2011 IEEE
2970
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
distinct noncircularity indices. It is also possible to extend a number of second-order source separation algorithms to noncircular signals as in [8]. However, the approach in [8] is deflationary and hence suffers from error accumulation when separating a large number of sources. In this paper, we consider the blind separation of complex stationary Gaussian sources where they can be circular or noncircular, and are with distinct correlation or relation functions. In addition, we emphasize simultaneous extraction of all the sources, i.e., do not adopt a deflationary approach. Although the SOBI algorithm can be readily extended to the separation of noncircular sources by including the timedelayed pseudo covariance matrices for approximate joint diagonalization, it is not motivated by a well defined cost and hence not necessarily optimal. In this paper, we propose a new simultaneous blind separation algorithm for stationary complex Gaussian sources by assuming a widely linear AR model for sources, and minimizing the mutual information of separated time series. We compare the new algorithm with SUT, SOBI and its extended version introduced in this paper, noncircular SOBI, which includes the time-delayed pseudo covariance matrices for approximate joint diagonalization as well. In our simulations, considerable performance gain over circular separation algorithms is observed by using the noncircularity information of sources. In the rest of this paper, all the sources are assumed to be stationary, Gaussian, and can be circular or noncircular.
t is suppressed for simplicity. To exploit the temporal-dependence of samples of each source, we can introduce the following cost function [15], [16]:
( ; . . . ; yN ) =
Ir y1
N n=1
( ) 0 2 log jdet(W)j 0 Hr (x)
Hr yn
(3)
where Hr (yn ) is the entropy rate of the nth separated source, and the entropy rate of observations Hr ( ) = limt!1 H [ (1); . . . ; (t)]=t is a constant with respect to . The cost given in (3) reduces to (2) when all the sources are independently and identically distributed (i.i.d.). Note that the minimization of Ir (y1 ; . . . ; yN ) is equivalent to the minimization of the sum of entropy rates of separated sources by fixing the determinant of to be a nonzero constant, since Hr ( ) is a constant independent of . In this paper, we study the separation of stationary complex Gaussian sources by minimizing the cost given in (3). The entropy rate Hr (yn ) is estimated by modeling fyn (t)g as a widely-linear AR process, and the demixing matrix and AR coefficients are jointly estimated by minimizing the cost given in (3). We do not consider orthogonal separation, where the mixtures are pre-whitened and the demixing matrix is constrained to be a unitary matrix, because the unitary constraint limits the search space for the optimal demixing matrix and hence may lead to suboptimal performance [14], [17]. In the next section, we first present a parametric entropy rate estimator for complex Gaussian sources.
Wx
x
x
WW
x
II. BACKGROUND III. ENTROPY RATE OF STATIONARY NONCIRCULAR AR GAUSSIAN PROCESSES
A. Complex Gaussian Random Variables and Processes
= + ()
The entropy of a complex random variable z zR jzI is defined as H z H zR ; zI p zR ; zI E p zR ; zI , where p z is the probability density function (pdf), zR and zI are the real and . We assume that z is zero mean. imaginary parts of z , and j , and called As in [11], z is called second-order circular if E z 2 strictly circular, or simply circular, if z and z j have the same pdf for any < . It is clear that for a complex Gaussian random variable, second-order circularity implies circularity. By regarding a Gaussian random variable z as a real-valued bivariate Gaussian random variable zR ; zI T , it is straightforward to show that
()
(
) = 0 [log ( = p01
0
[ ] H (z ) = log(2e) + 0:5 log
)]
(
)
[ ]=0 exp( )
2 E zR E zI2
0 [E (zR zI )]2
:
(1)
f ( )g [ ( ) ( )] = 0 3 f ( )g
A zero-mean Gaussian process v t is called circular white if
t1 t2 and E v t1 v t2 E v t1 v3 t2 [12], where is the Kronecker-delta function, and superscript denotes complex conjugate. Any stationary complex random process u t can be modeled as the output of a widely linear system driven by a circular white noise [12], which is called innovation process in this paper. For a stationary random process u t , its entropy rate can be calculated as ; ;u =t. Specially, we have Hr u t!1 H u t ; u t Hr u H u when u t is white. A nonparametric entropy rate estimator is proposed in [13]. In this paper, we introduce a new entropy rate estimator, which is parametric, by assuming a widely linear AR model for u t .
[ ( ) ( )] = ( 0 )
f ( )g [ ( ) ( 0 1) . . . (1)] ()
( ) = lim ( )= ( )
f ( )g
B. Complex ICA and BSS A natural cost function for achieving source separation is mutual information I (y1 ; . . . ; yN ) among N separated random variables yn ; n = 1; . . . ; N , written as [14]
( ; . . . ; yN ) =
I y1
N
n=1
( ) 0 2 log jdet(W)j 0 H (x)
H yn
(2)
where H (yn ) is the entropy of the nth separated source, entropy of observations H ( ) is a constant with respect to , and the time index
x
W
A. The Innovation Process It is known that for a stationary Gaussian random process fu(t)g, its entropy rate is equal to the entropy of the prediction error w(t) = u(t)0^ u(t j t 01; . . . ; 1) [19], where u ^(t j t 01; . . . ; 1) denotes the optimal prediction of u(t) based on the past samples u(t 0 1); . . . ; u(1). For noncircular Gaussian sources, this prediction error can be calculated by using a widely linear predictor [8], [12], [18]. Although we can readily calculate this prediction error, it can be noncircular, which complicates the design of the separation algorithm. In this paper, we use the following widely linear filter to estimate the innovation process fv (t)g:
( )=
v t
p01 q=0
( ) ( 0 q) +
a3 q u t
p01 q=0
( + p)u3 (t 0 q) = aH u (t)
a3 q
a
(4)
0
are where p is the filter length, = [a(0); . . . ; a(2p 0 1)]T 6= the widely linear filter coefficients, and (t) = [u(t); . . . ; u(t 0 p + 1); u3 (t); . . . ; u3 (t 0 p +1)]T is the augmented input vector. For large enough p, (4) can approximate any widely linear system. The optimal filter coefficients can be solved by minimizing H (v ) under certain constraints, as shown in Section III-B. Note that the minimum mean-square error (MMSE) criterion and the entropy criterion are not always equivalent when working with noncircular signals. Furthermore, unlike the prediction error, the innovation process defined in (4) can be made circular and white by adjusting the filter coefficients. This greatly simplifies the design of the separation algorithm. Thus, we only use the entropy criterion in the following derivations.
u
B. Entropy Rate By mapping all the variables in (4) into the real domain, we have
( ) = p01 A(q) ( ) q=0
vR t vI t
( 0 q) ( 0 q)
uR t uI t
(5)
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
2971
where v (t) = vR (t) + jvI (t); u(t 0 q ) = uR (t 0 q ) + juI (t 0 q ), and matrix (q ) is
A
Da = diag
Since for a Gaussian signal model it is impossible to identify the phase part of a linear system given only its observable outputs, we assume that (5) is causal. Then, from (5), we have
H [vR (t); vI (t); . . . ; vR (1); vI (1)]
B
H [uR (t); uI (t); . . . ; uR (1); uI (1)] + log j det( )j (6)
where the Jacobian matrix
T
; vI (1)] B = @@[[uvR((tt));; uvI ((tt));; .. .. .. ;; vuR(1) T (1); u (1)]
R
uu
aI (q ) 0 aI (q + p) : aR (q ) 0 aR (q + p)
A(q) = 0aRaI((qq))+0aaRI((qq++pp))
=
R
where u = E [ H ] is the augmented covariance matrix, and a diagonal matrix defined as
I
R
I
A
is a block upper triangle matrix with block diagonal element (0). Then we can divide both sides of (6) by t and let t ! 1 to obtain
A
s:t:
ja(0)j2 0 ja(p)j2
0 2 )
=1
(7)
where min and s:t: are the abbreviations for minimize and subject to 2 2 2 2 constraint(s), respectively, R = E [vR ]; I = E [vI ]; = E [vR vI ], 2 and we have used (1), det[ (0)] = ja(0)j 0 ja(p)j2 and Hr (v ) = H (v ). Instead of minimizing the cost function given in (7), we propose to use the following cost:
A
min H (v ) = log(e)+log E 0
a
jvj2
;
s:t:
ja(0)j2 0ja(p)j2
=1
(8)
a
La ( ; ) = log(e) + log
2 2 H (v ) log(2e) + 0:5 log(R I )
jj
aH Da a 0 1
+ a;1
a
a
0
Ru a + a;2Da a = 0 (11) where a;2 = a;1 aH Ru a is a real number. Thus, the optimal a is a generalized right eigenvector of matrix pencil (Ru ; Da ). As Ru is the augmented covariance matrix, it is nonnegative definite. We consider the two possible cases: 1) u is Strictly Positive Definite: For this case, from (11), we have
Ru 1Da a = a;3 a 0
a
where a;3 = 01=a;2 is a real number. Thus, the optimal is 1 an eigenvector of u0 a . To minimize the cost function given 1 in (10), is selected to be the eigenvector of u0 a associ ated with the maximum eigenvalue. It can be calculated as = + = jja+ (0)j2 0 ja+ (p)j2 j, where + = [a+ (0); . . . ; a+ (2p 0 1)]T 1 is the eigenvector of u0 a associated with the maximum eigen value. The entropy rate estimate of fu(t)g is log(e H u ). 2) u is Nonnegative, but not Strictly Positive Definite: This condition implies det( u ) = 0. Thus, fu(t)g is a degenerate process as discussed in [12]. A solution for the eigenproblem given in (11) is ( ; a;2 ) = ( 0 ; 0), where 0 is an eigenvector of u associated with eigenvalue zero. For this solution, the cost function given in (10) becomes 01, which should be expected since fu(t)g is degenerate, and hence has an entropy rate of 01. We do not further consider the degenerate processes in the rest of this paper.
R D
a
a
R D
a
R D
R
a
a
a Ra
R
a
a
R
IV. BLIND SEPARATION OF STATIONARY COMPLEX GAUSSIAN SOURCES
By using the entropy rate estimator given in (8), we can obtain the following cost function from (3): (9)
Next, we note that this bound is achievable by adjusting the widely linear filter coefficients without violating the constraint given in (7). In fact, we can always pre-multiply both sides of (5) with a suitable rotation matrix to make = 0, and then pre-multiply a diagonal matrix 2 2 diag(; 1=) on both sides to make R = I , where 6= 0 is a proper real number. Then all the equalities given in (9) hold without violating the constraint given in (8). Thus, H 0 (v ) is a tight upper bound of H (v ), and H 0 (v ) = H (v ) when v (t) is second-order circular. Without loss of generality, we assume that det[ (0)] = 1, i.e., ja(0)j2 0 ja(p)j2 = 1, and rewrite the problem given in (8) as 0 H u ; s:t: H a = 1 (10) min H (v ) = log(e) + log
A
a Ra
aH Ru a
A. The Cost Function
= log(2e) + log(R I ) 2 2 log(2e) + log (R + I )=2 2 0 = log(e) + log E v = H (v ):
a
1
0
where Lagrangian multiplier a;1 is a real number. By letting @La ( ; )=@ 3 = , we obtain
because an optimal solution of (8) is also an optimal solution of (7), and at the same time, (8) has a simpler form than (7). To show this point, we need to demonstrate that H 0 (v ) is a tight upper bound of H (v ). First we note that H 0 (v ) is an upper bound of H (v ) since
p
1
0
:
The problem given in (10) can be written as the Lagrangian function
A
Thus, by constraining j det[ (0)]j = 1 to make processes fu(t)g and fv(t)g carry the same amount of information, and minimizing the entropy of v (t), we can solve for the optimal filter coefficients. Hence, we have the following cost function: a
p
01; 0; . . . ; 0
R
Hr (v ) = Hr (u) + log jdet [ (0)]j :
2 2 min H (v ) = log(2e) + 0:5 log(R I
1; 0; . . . ; 0;
Da is
a Da
min
s:t:
W; a1; . . . ; aN ) = N log(e) +
J(
N n=1
log E
jvn j2
0 2 log j det(W)j 0 Hr (x); jan (0)j2 0 jan (p)j2 = 1; n = 1; . . . ; N
(12)
w x
H (t) is the nth separated process, vn (t) = where yn (t) = p01 a3 (q)yn (t 0 qn) + p01 a3 (p + q)y3 (t 0 q) is the estimated n q=0 n q=0 n innovation process of fyn (t)g; n = [an (0); . . . ; an (2p 0 1)]T are the widely linear filter coefficients for generating the nth innoand n ; vation process, and Hr ( ) is a quantity independent of n = 1; . . . ; N . For simplicity of discussion, we assume that all these widely linear filters have the same filter length.
a
x
W
a
2972
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
hnhHn T wn (18) h n hn w n The proposed BSS algorithm alternatively optimizes W and an T HT From (18) we know that the optimal = [ w w n = 1 . . . , until convergence. When W is kept constant, from n wn ] is an eigenvector of matrix (12) we know that the optimal filter coefficients can be obtained as H T T ( n ) = 1 . . . , by using the entropy rate = arg min aestimator R R h n z ; + Rz ; z ; + Rz ; n hn H H T presented in Section III. Thus, we only need to consider the R R h z ; + Rz ; z ; + Rz ; n hn algorithm for optimizing W in this section. n is selected to be the To minimize the cost function given in (14), w As in [20], we divide the problem of minimizing (W a . . . aN ) eigenvector of the above matrix associated with the maximum eigenH wN ] into a series of subproblems value. with respect to W = [w such that we minimize (W a . . . aN ) with respect to each of the blind separation algorithm alternatively updates the row vectors H row vectors wn = 1 . . . , which is an easier problem to solve. of The W by solving the eigenproblem given in (18) and the widely linear = 1 . . . 1 +1 . . . , filter coefficients by using the entropy rate estimator in Section III until Hence, we first update wn while wm are kept constant. For this task, we first write the cost function given convergence. We call the proposed algorithm entropy rate minimizain (12) as a function of only wn . As shown in [20], we can rewrite tion (ERM) algorithm. det(W) as: V. N SOBI det(W) = nhHn wn (13) In this section, we introduce an extended version of SOBI, noncirwhere n = det(WnWnH ) is a term independent of wn Wn is cular SOBI (ncSOBI), which includes the pseudo time-delayed covarith row vector of W, ance matrices for approximate joint diagonalization, and we use it in 1) matrix obtained by removing the an ( H w i.e., Wn = [w . . . wn n . . . wN ] , and hn is a vector of the simulation comparisons presented in the next section. Note that unit length that satisfies Wn hn = 0. By using (13), we can write (12) many approximate joint diagonalizers, e.g., the ones proposed in [2], [20], [21], are not originally designed for the diagonalization problem as a cost function of only wn as arising in ncSOBI. However, they can be readily modified to take into 2 log hHn wn + (14) account the diagonalization structure of the pseudo time-delayed con (wn ) = log n variance matrices. As in [4], we can extend the approximate joint diagwhere is a quantity independent of wn , and we have use the fact onalization cost proposed in [20] as K that n is independent of wn . By introducing the following two vector-valued processes: (W) = o[WC( k )WH ] F p k K z ;n( ) = q ( )x( ) + o[WP( k )WT ] F 2 log det (W) (19) p k z ;n( ) = q ( + )x ( ) tr(XXH ) denotes the Frobenius norm of X where X F = o(X) = X diag(X), and C( k ) and P( k ) are the covariance and we can write process n ( ) as pseudo covariance matrices of mixtures at time delay k respectively. Especially, ncSOBI reduces to SUT [6] if only C(0) and P(0) are T H H n zn ( ) n ( ) = wn z ;n ( ) + wn z ;n ( ) = w jointly diagonalized. Since C(0) is Hermitian and P(0) is symmetric, C(0)By and P(0) can be exactly jointly diagonalized. n = [wnT wnH ]T , and zn = [zT;n zT;n]T . Thus, we have where w following the proof in [20], it can be shown that the cost func nH Rz w n (15) tion given in (19) is lower bounded if and only if all these covariance n =w and pseudo covariance matrices cannot be exactly jointly diagonalized. where Rz = [zn zH ] . Note that any change of the scaling of w n In practice, these matrices can be only approximately jointly diagonaln does not change the cost function given in (14). To remove this inherent ized due to the estimation errors. Then the cost given in (19) is lower scaling ambiguity, we assume that wn is scaled such that hH n wn = bounded and thus can be minimized by using an appropriate optimiza, where can be an arbitrary positive constant. Now, considering tion algorithm. In this paper, the fast approximation joint diagonal=
B. The Algorithm
w;2
3
3
:
;
n
;
;
;N
[new]
a
H
0
v
;n
;
;N
11
3
J
;
1;
22
12
3
12
01
12
12
11
22
:
3
;
1; 1 1 1 ;
J
; n
;
;
1;
;
;N
;m
j
;
; n0 ; n
;
;N
j
j
S
j
j
N0
ONCIRCULAR
S
j
;
n
2N
1;
01 ;
;
J
+1 ;
E
jv
j
2
;
0
C1
C1
S
2
R
01
t
1
a
=1
3
q
t 0 q ;
2
=0
01
3
t
2
a
p
3
q
t0 q
k
=0
fv
v
t
0
j
k
;
0
t g
t
1
j
=1
t
2
t
1
;
1
E
;
2
2
jv
j
E
j
c0
j
c0
this constraint and (15), we can rewrite (14) as the following problem:
min log w nH Rz w n
st
;
: :
hHn wn =
(16)
c0 :
ization (FAJD) algorithm given in [20] is extended to the minimization of the cost given in (19), and we call the new algorithm extended FAJD. As in FAJD, extended FAJD divides the problem of minimizing R( ) with respect to = [ 1; 1 1 1 ; N ]H into a series of subproblems such that R( ) is minimized with respect to each of the row H ; n = 1; . . . ; N . Hence, we first update vectors n n while m ; m = 1; . . . ; n 0 1; n +1; . . . ; N , are kept constant. By using (13), we can write R( ) as a function of n as
W
WW w
w
w w w wR w hw where Lagrangian multiplier w; is a real number. By letting W w w (wn w; ) wn = 0, we have H H Rz ; + RzT ; wn+ Rz ; + RzT ; wn = w; hnhHn wn (20) n (wn ) = wn (Qn + Pn ) wn 2 log hn wn + (17) where is a quantity independent of wn , and H H N K where Rz ; = z ;n z ;n Rz ; = [z ;n z ;n ] Rz ; = H H H Q = C ( k )wm wmH CH ( k )+ CH ( k )wm wmH C( k ) [z ;nz ;n], and w; = 0 5 w; w n Rz w n hn wn is a real n k m ;m n number. From (17) we have T T Rz ; + RzH ; Rz ; + RzH ; wn Rz ; + Rz ; Rz ; + Rz ; wn The problem given in (16) can be written as the Lagrangian function H H Lw ( n ; w;1 ) = log[ n z n ] + w;1 n n 0 c0
w
@L
;
1 =@
11
1
3
3
12
22
12
2
0
R
C2
C2
11
E
2
2
E
1
1
;
0 :
2
12
E
1
=j
1
2
;
22
j
=1
11
3
12
12
22 12
3
11
12 22
3
=1
6=
1In general, two complex (pseudo) covariance matrices cannot always be exactly jointly diagonalized [4].
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
Pn=
N
K
k=1 m=1;m6=n
2973
P(k )wm3 wmT PH (k )+ PT (k )wm3 wmT P3 (k ) :
w
w = 0, we can obtain the optimal wn as 01 wn[new] = (Qn + Pn ) h0n1 : (21) hHn (Qn + Pn ) hn
3 By letting @Jn ( n )=@ n
W
by using Extended FAJD alternatively updates the row vectors of (21) until convergence. Note that, even though in our implementation, the extended FAJD algorithm is used to jointly diagonalize the covariance and pseudo covariance matrices, it can be also used to jointly diagonalize other types of target matrices, such as cumulant matrices. VI. IDENTIFICATION CONDITIONS In this section, we discuss the identification conditions for the second-order statistics based separation algorithms considered here. We assume that is linear mixture of N independent sources, and all the second-order statistics to be used are available. Thus, we only need to consider unitary separation, since exact pre-whitening can be achieved. Hence, we further assume that the mixtures have been pre-whitened. To remove the scaling ambiguity, we assume that the sources are scaled such that they have unit covariance and real nonnegative noncircularity coefficients n = E [s2n ]=E [jsn j2 ]; n = 1; . . . ; N . is a unitary matrix and we can exactly jointly In ncSOBI, as diagonalize the covariance and pseudo covariance matrices, Hermitian transpose of its rows, i.e., n ; n = 1; . . . ; N , are the eigenvectors of (k ) and (k ) H (k ). In general, n ; n = 1; . . . ; N , are the eigenvectors of
x
W
C
P
K k=1
k
P
C(k ) +
w
w
K k=1
k
P(k )
K k=1
k
P(k )
H
where k is an arbitrary real weight, and k is an arbitrary complex weight, k = 1; . . . ; K . By following the proof in [2], it can be shown that a unique unitary matrix can be identified up to arbitrary order ambiguity of row vectors if and only if vectors T n = [rn (1 ); . . . ; rn (K ); n (1 ); . . . ; n (K )] of different 3 sources are distinct, where rn ( ) = E [sn (t)sn (t 0 )] and
n ( ) = E [sn (t)sn (t 0 )] are the correlation and relation functions of the nth source respectively. For sources with distinct correlation or relation functions, it is always possible to select a set of time-delays such that this identification condition is satisfied. In SUT, we have T T n = [rn (0); (0)] = [1; n ] , and hence the sources can be separated by SUT if and only if they have distinct noncircularity indexes [5], [6]. The ERM algorithm and ncSOBI have the same identification conditions. In fact, at the minimum of (12), the separated sources must be strongly uncorrelated up to time-delay p 0 1 such that their mutual in3 (t 0 q)] = 0; E [sm(t)sn(t 0 formation is exactly zero, i.e., E [sm (t)sn q )] = 0 for any m 6= n and q = 0; . . . ; p 0 1, which suggests that (q ) and (q ); q = 0; . . . ; p 0 1, are exactly jointly diagonalized. Thus, if the first p correlation or relation functions of different sources are distinct, the separation matrix can be uniquely identified. On the other hand, if there exists a pair of sources such that their first p correlation and relation functions are the same, then from (11), we know that these two sources have the same widely linear filter coefficients. It is clear that such sources cannot be separated by minimizing the cost function given in (12). We would like to point out that, when compared with SOBI, the ERM and ncSOBI algorithms have less constrained identification conditions since they exploit the extra noncircularity informa-
r
r
P
C
Fig. 1. Typical convergence curves of ERM with different filter lengths for the separation of five widely linear AR sources with 1000 samples. At each iteration, the widely linear filter coefficients for all the sources and the demixing matrix are updated once.
tion. For example, they can separate sources with identical correlation functions but with distinct relation functions, while SOBI cannot. VII. SIMULATION RESULTS In this section, we present several simulation results to study the separation of stationary complex sources by only using second-order statistics. The proposed algorithm is compared with SOBI and its extended version, ncSOBI. The time delay is selected as k = k 0 1; k = 1; . . . ; K . For all algorithms, the demixing matrix is initialized with the identity matrix. The intersymbol-interference (ISI) index [22], [23] N N N N jcij j jcij j ISI = 21N 01 + 01 max jcik j max k k jckj j i=1 j =1 j =1 i=1 is used to evaluate the performance of a BSS algorithm, where cij de= . Note that notes the (i; j )th element of combined matrix this performance index (PI) is sensitive to the scaling of the vectors and . Hence, both and should be properly normalized of such that all the sources and their estimations have the same variance when we calculate this PI. Experiment 1: In this experiment, we consider the separation of sources generated by widely linear AR model driving by circular white Gaussian noise. The widely linear AR order is fixed to be 4, and the filter coefficients are randomly selected2. The sources are mixed through a random mixing matrix, whose elements are randomly drawn from the standard bivariate Gaussian distribution, and then separated by SUT, SOBI,3 ncSOBI, and the proposed algorithm, ERM algorithm. Fig. 1 shows the typical convergence curves of the proposed algorithm by choosing different filter length p and starting from the iden, where we observe that the proposed tify matrix initial guess for algorithm converges very fast, and the final value of the cost decreases
C WM
W
M
W
M
W
2The filter coefficients are drawn from standard bivariate Gaussian distribution, and are accepted if the resultant widely linear AR model is stable. 3Note that the original SOBI algorithm uses an orthogonal joint diagonalizer. In order to make the results of SOBI and ncSOBI comparable, we use the same type of joint diagonalizers for them. No performance loss is observed for SOBI by using FAJD for joint diagonalization.
2974
Fig. 2. Average ISI of ERM, SOBI, ncSOBI and SUT in the separation of widely linear AR Gaussian sources. The number of sources is 5, and the sample size varies from 200 to 10 000. Each simulation point is averaged over 100 independent runs.
as the filter length p increases, when p is underestimated, i.e., p < 5. However, if p is not underestimated, no performance gain is observed by increasing p. In practice, a possible way to determine p is to assume that the sources are generated by widely linear AR model and to use information theoretic criteria to estimate the order. Fig. 2 shows the average PI when the number of sources is fixed to be 5, and the number of samples varies from 200 to 10 000. From Fig. 2, we observe that SUT shows the poorest performance. SOBI outperforms SUT by using the information of several time-delayed covariance matrices, and ncSOBI outperforms SOBI by further exploiting the information of time-delayed pseudo covariance matrices. Note that by setting p = 1 in the proposed algorithm, no information on the temporal structure of the sources is exploited and the proposed algorithm has exactly the same performance as SUT. However, the new algorithm can efficiently exploit the temporal structure of sources, and considerably outperform SUT and SOBI by choosing a value for p that is greater than 1. Furthermore, its performance is not sensitive to the selection of filter length p as long as p is not underestimated. Experiment 2: These second-order statistics based algorithms can be used to separate non-Gaussian sources as well, as long as the identification conditions are satisfied. In this experiment, the sources are generated by widely linear moving average (MA) model 4 4 3 3 3 s(t) = q=0 g (q)v(t 0 q) + q =0 h (q)v (t 0 q) driving by non-Gaussian noise v , where the filter coefficients are randomly generated, and the real and imaginary parts of the driving noise v are uniformly distributed in [01; 1]. As a result, all sources are non-Gaussian. Fig. 3 shows the average PI when the number of samples is fixed to be 2000, and the number of sources varies from 3 to 13. From Fig. 3 we observe that the proposed algorithm still works well. Again, it outperforms SUT and SOBI by choosing a p larger than 1, and its performance is not sensitive to the selection of p as long as p is large enough. Still, marginal performance gain is observed by using a large p, which suggests that the true widely linear MA model is better approximated by using a widely linear AR model of higher order.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
Fig. 3. Average ISI of ERM, SOBI, ncSOBI and SUT in the separation of widely linear MA non-Gaussian sources. The sample size is 2000, and the number of sources varies from 3 to 13. Each simulation point is averaged over 100 independent runs.
VIII. CONCLUSION We have studied blind separation of stationary complex sources by only using second-order statistics and using the correlation information among samples. Compared with the separation of circular sources, pseudo covariance matrices of noncircular sources provide further information for separation. By modeling the sources as outputs of widely linear autoregressive (AR) systems driven by circular noise, we derive a parametric entropy rate estimator for stationary noncircular Gaussian sources, and develop an efficient blind separation algorithm by minimizing the mutual information of separated time series. We compare the new algorithm with those algorithms based on the approximate joint diagonalization of time-delayed (pseudo) covariance matrices. Simulation results demonstrate the superior performance of the new algorithm. Finally, we would like to note that for the overdetermined case, where we have more observations than sources, it is common to first reduce the dimensionality of the mixtures and then to apply the BSS algorithm on the dimension-reduced mixtures. The traditionally used principal component analysis for the problem as in [24] is suboptimal in the presence of noncircular signals, and hence for the complex case the number of sources can be determined by using the noncircular principal component analysis (ncPCA) method introduced in [25], which explicitly models circular and noncircular sources in a given linear model.
REFERENCES [1] P. Comon, “Independent component analysis: A new concept?,” Signal Process., vol. 36, no. 3, pp. 287–314, 1994. [2] A. Belouchrani, K. A. Meraim, J. F. Cardoso, and E. Moulines, “A blind source separation technique based on second order statistics,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, 1997. [3] A. Yeredor, “Blind separation of Gaussian sources via second-order statistics with asymptotically optimal weighting,” IEEE Signal Process. Lett., vol. 7, no. 7, pp. 197–200, 2000. [4] W.-J. Zeng, X.-L. Li, X.-D. Zhang, and X. Jiang, “An improved signal-selective direction finding algorithm using second-order cyclic statistics,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (IEEE ICASSP), 2009, pp. 2141–2144. [5] L. D. Lathauwer and B. D. Moor, “On the blind separation of noncircular sources,” presented at the Eur. Signal Process. Conf. (EUSIPCO), Toulouse, France, 2002.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 6, JUNE 2011
[6] J. Eriksson and V. Koivunen, “Complex-valued ICA using second order statistics,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process. (MLSP), Sao Luis, Brazil, 2004, pp. 183–192. [7] E. Ollila and V. Koivunen, “Complex ICA using generalized uncorrelating transform,” Signal Process., vol. 39, no. 4, pp. 365–377, Apr. 2009. [8] S. Javidi, B. Jelfs, and D. P. Mandic, “Blind extraction of noncircular complex signals using a widely linear predictor,” in Proc. IEEE/SP 15th Workshop Statist. Signal Process., 2009, pp. 501–504. [9] D. T. Pham, “Blind separation of instantaneous mixture of sources via the Gaussian mutual information criterion,” Signal Process., vol. 81, pp. 855–870, 2001. [10] S. Degerine and A. Zaidi, “Separation of an instantaneous mixture of Gaussian autoregressive sources by the exact maximum likelihood approach,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1492–1512, Jun. 2004. [11] B. Picinbono, “On circularity,” IEEE Trans. Signal Process., vol. 42, no. 12, pp. 3473–3482, Dec. 1994. [12] B. Picinbono and P. Bondon, “Second-order statistics of complex signals,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 411–420, 1997. [13] W. Xiong, H. Li, T. Adalı, Y.-O. Li, and V. D. Calhoun, “On entropy rate for the complex domain and its application to i.i.d. sampling,” IEEE Trans. Signal Process., vol. 58, no. 4, pp. 2409–2414, 2010. [14] T. Adalı, H. Li, M. Novey, and J. F. Cardoso, “Complex ICA using nonlinear functions,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4536–4544, Sep. 2008. [15] X.-L. Li and T. Adalı, “Blind spatiotemporal separation of second and/or higher-order correlated sources by entropy rate minimization,” presented at the IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Dallas, TX, 2010. [16] G. Gomez-Herrero, K. Rutanen, and K. Egiazarian, “Blind source separation by entropy rate minimization,” IEEE Signal Process. Lett., vol. 17, no. 2, pp. 153–156, 2010. [17] J. F. Cardoso, “On the performance of orthogonal source separation algorithms,” in Signal Process. VII, Proc. Eur. Assoc. Signal Process., Edinburgh, Scotland, 1994, pp. 776–779. [18] B. Picinbono, “Wide-sense linear mean square estimation and prediction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), 1995, pp. 2032–2035. [19] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [20] X.-L. Li and X.-D. Zhang, “Nonorthogonal joint diagonalization free of degenerate solution,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 1803–1814, May 2007. [21] A. Yeredor, “Non-orthogonal joint diagonalization in the least-squares sense with application in blind source separation,” IEEE Trans. Signal Process., vol. 50, no. 7, pp. 1545–1553, Jul. 2002. [22] O. Macchi and E. Moreau, “Self-adaptive source separation by direct or recursive networks,” in Proc. Int. Conf. Digital Signal Process., Limasol, Cyprus, 1995, pp. 122–129. [23] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for blind signal separation,” in Advances in Neural Information Processing Systems 1995. Boston, MA: MIT Press, 1996, pp. 752–763. [24] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 2, pp. 387–392, 1985. [25] X.-L. Li, M. Anderson, and T. Adalı, “Principal component analysis for noncircular signals in the presence of circular white Gaussian noise,” in Proc. 2010 Asilomar Conf. Signals, Syste., Comput., Pacific Grove, CA, 2010.
2975
Adaptive One-Class Support Vector Machine Vanessa Gómez-Verdejo, Jerónimo Arenas-García, Member, IEEE, Miguel Lázaro-Gredilla, Member, IEEE, and Ángel Navia-Vázquez, Senior Member, IEEE
Abstract—In this correspondence, we derive an online adaptive one-class support vector machine. The machine structure is updated via growing and pruning mechanisms and the weights are updated using structural risk minimization principles underlying support vector machines. Our approach leads to very compact machines compared to other online kernel methods whose size, unless truncated, grows almost linearly with the number of observed patterns. The proposed method is online in the sense that every pattern is only presented once to the machine and there is no need to store past samples and adaptive in the sense that it can forget past input patterns and adapt to the new characteristics of the incoming data. Thus, the characterizing properties of our algorithm are compactness, adaptiveness and real-time processing capabilities, making it especially well-suited to solve online novelty detection problems. Regarding algorithm performance, we have carried out experiments in a time series segmentation problem, obtaining favorable results in both accuracy and model complexity with respect to two existing state-of-the-art methods. Index Terms—Adaptive methods, one-class SVM, online-novelty detection.
I. INTRODUCTION In recent years, there has been an increasing interest in the application of machine learning techniques to the detection of rare or unseen patterns [1]. Sometimes, data representing the “normal behavior” of the system are time-varying, thus adaptive novelty detectors are required. This is the case in applications such as intrusion detection [2], audio and speech segmentation [3], [4] or wireless sensor networks [5]. One-class support vector machines (1-SVMs) [6], [7] have been introduced for novelty detection systematically achieving good results [3], [4], [8]. The underlying idea of 1-SVMs is to model the support of the distribution, i.e., the input region where “normal” data are expected to lie. Traditional 1-SVMs assume a batch formulation, so they cannot be directly applied to online novelty detection. Instead, it is necessary to derive adaptive schemes that update the model in an online fashion, forgetting the old patterns that no longer represent the current behavior of the system. Many authors have proposed online versions of margin-based algorithms. The first attempts [9]–[11] iteratively update the model by incorporating every training example that is either misclassified or classified with an insufficient margin, thus indefinitely increasing model complexity. Other techniques [12]–[15] try to solve this drawback by also removing data as they become redundant. This procedure is aimed at incremental learning (a fast alternative to batch SVM), and it is not suited to an adaptive scenario in which the solution has to be able to track changes of the data distribution and to forget oldest patterns. Manuscript received July 08, 2010; revised October 13, 2010, December 31, 2010, and March 02, 2011; accepted March 02, 2011. Date of publication March 10, 2011; date of current version May 18, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Konstantinos Slavakis. Their work was supported in part by MEC Grant TEC2008-02473 (Spanish Goverment). The authors are with the Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Madrid 28911, Spain (e-mail: {vanessa@tsc. uc3m.es;
[email protected];
[email protected];
[email protected]). Digital Object Identifier 10.1109/TSP.2011.2125961
1053-587X/$26.00 © 2011 IEEE