Blind separation of sources using higher-order cumulants

Report 2 Downloads 39 Views
Signal Processing 73 (1999) 267—276

Blind separation of sources using higher-order cumulants Bin-Chul Ihm, Dong-Jo Park* Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong, Yusong-gu, Taejon, 305—701, South Korea Received 26 October 1997; received in revised form 12 October 1998

Abstract In this paper, we propose a novel method of blind signal separation using a new performance criterion with nonlinear separating functions, and derive an adaptive algorithm. The criterion consists of nonlinear mappings of output signals and their cumulants for independence of the signals. We show that our method requires a weak condition for stable separation through the ODE analysis. Simulation results are given to verify the validity and advantage of the proposed algorithm.  1999 Elsevier Science B.V. All rights reserved. Zusammenfassung In diesem Artikel schlagen wir eine neuartige Methode zur blinden Signalseparation vor, die ein neues Leistungskriterium basierend auf nichtlinearen Separationsfunktionen verwendet. Daraus wird ein adaptiver Algorithmus abgeleitet. Das Kriterium geht aus nichtlinearen Abbildungen der Ausgangssignale und ihrer Kumultanten zur Unabha¨ngigkeit der Signale hervor. Durch eine auf gewo¨hnlichen Differenzialgleichungen basierende Analyse zeigen wir, da{ unsere Methode unter schwachen Voraussetzungen auf eine stabile Separation fu¨hrt. Es werden Simulationsergebnisse vorgelegt, um die Gu¨ltigkeit und die Vorteile des vorgeschlagenen Algorithmus zu u¨berpru¨fen.  1999 Elsevier Science B.V. All rights reserved. Re´sume´ Dans cet article, nous proposons une nouvelle me´thode de se´paration de signaux aveugle utilisant un nouveau crite`re de performance utilisant des fonctions de se´paration non-line´aires, et nous en de´rivons un algorithme adaptatif. Le crite`re consiste en des appariements non-line´aires des signaux de sortie et de leurs cumulants pour l‘inde´pendance des signaux. Nous montrons que notre me´thode demande une condition faible pour une se´paration stable au travers d’une analyse par EDO. Des re´sultats de simulations sont fournis pour ve´rifier la validite´ et l‘avantage de l’algorithme propose´.  1999 Elsevier Science B.V. All rights reserved. Keywords: Cumulant; Blind separation; Weight adaptation

* Corresponding author. Tel.: #82 42 869 3438; fax: #82 42 869 3410; e-mail: [email protected] 0165-1684/99/$ — see front matter  1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 1 9 7 - 2

268

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

1. Introduction The problem of blind separation of sources arises in many signal processing applications like communications, array processing, speech analysis and speech recognition. In all these instances, the underlying assumption is that several linear mixtures of unknown, random, zero-mean, and statistically independent signals which are called sources are observed; the problem consists of recovering the original signals from their mixtures without a priori information of coefficients of the mixtures and knowledge of the sources. Mathematically, the situation is described by the simple and well-known data model: X(t)"AS(t), t"1,2,2,

(1)

where X( ) ) and S( ) ) are column vectors of sizes of n and m, respectively, and A is an n;m matrix. The idea here is that the vector X( ) ) results from measurements by n sensors receiving contributions from m sources. The matrix A is called as a mixing matrix. Our task is to obtain a good estimate C of A\ (a pseudo-inverse when nOm) called as a separating matrix and to retrieve the sources by measurement outputs: ½(t)"CX(t).

(2)

However as shown in [18], there are indeterminacies in this problem, and A\ itself cannot be identified. What we can do is to obtain a rescaled and permuted version of the m source signals. That is, C"DPA\ where D and P are a diagonal matrix and a permutation matrix, respectively. In the recent years, since the remarkable work was introduced by Jutten and He´rault [14], various approaches have been presented in the literature [1,4—13,15—18]. In [14], the separating matrix is updated by odd nonlinear functions, which are called separating functions later, in order to achieve the independence of the processed output signals ½( ) ). Therein, the conditions of successful separation depend on characteristics of the source signals and the separating functions. For example, if a cubic (i.e., x) and a linear (i.e., x) functions are used as the separating functions, sub-Gaussian signals (kurtosis (3) are separable under assump-

tions with unit variances of the sources [16,17]. The Jutten—He´rault algorithm is inspired by a neuromimetic approach; this line is further followed by Karhunen et al. [15] and Cichocki et al. [7,8,12]. Later, Cardoso [5] and Comon [10] considered the nonadaptive source separation problem, and proposed different solutions based on the fourthorder statistics of ½( ) ). In [10], Comon defined the contrast, which was the square sum of the fourthorder cumulants, for the independence criterion. The contrast was used in other papers [11,6]. In these papers, the stability conditions can never be met if there are more than one Gaussian source signal. On the other hand, based on information theoretic ideas, Bell et al. proposed the nonlinear distortion of the output ½( ) ), i.e., ½(t)"g(X(t)) [1]. When g( ) ) has a hyperbolic tangent nonlinearity and the sources have super-Gaussian (kurtosis '3) distributions, the sources can be separable by maximizing the joint entropy of ½( ) ). In this paper, we propose a new method for signal separation that is based on a new criterion for the case of the square separating matrix A, namely, n"m. Section 2 presents this new algorithm derived from the criterion. Section 3 analyses the stationary points of the proposed performance criterion. Finally, Section 4 gives the results of computer experiments, and Section 5 is devoted to conclusions.

2. New performance criterion and learning algorithm 2.1. Cumulants In the blind source separation problem, we cannot know characteristics of source signals and channels a priori, therefore we use the only assumption that the source signals are statistically independent. So we need to discuss the statistical independence before we proceed. If we denote f (s ) QG G as a marginal probability density function of random variables s ,s ,2,s and f (s ,s , ,s )   L Q Q 2 QL   2 L as a joint probability density function. And if s ,s ,2,s are independent, then the joint probabil  L ity density function is represented by the product of

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267—276

the marginal probability density function of each variable, i.e., L (s ,s , ,s )" “ f (s ). (3) f QH H Q Q 2 QL   2 L H However, since we cannot estimate the probability density function of the signals, we use the cumulants as a measure of statistical independence. The cumulant is defined as follows [3]: cum(s ,s ,2,s )   L



 



" (!1)N\(p!1)! E “ s 2 E“ s , (4) H H HZ' HZ'N where the summation extends over all partitions (I ,I ,2,I ), p"1,2,2,n, of (1,2,2,n). The cumu  N lant has a cumulant generating function as the moment. For random signals s ,s ,2,s , the cumu  L lant generating function is defined as follows [3]: W(v ,v ,2,v )"log E(e LGQGTG). (5)   L cum(s ,s ,2,s ) is given by the coefficient of   L (j)Lv v 2v in the Taylor series expansion of the   L above cumulant generating function about the origin. If +s ,s ,2,s , are statistically independent, the   L cumulant generating function is represented by sum of that of each signal as follows: L W(v ,v ,2,v )" log E(e QGTG) (6)   L G L " W(v ). (7) G G If we take the Taylor series expansion at both sides of Eq. (7), the right-hand side has no crosscumulant, though the left-hand side has all joint cross-cumulants. To satisfy the equality in the equation, all joint cross-cumulants of the left-hand side of Eq. (7) must vanish. From this point of view, we can recognize that if we make all joint crosscumulants of signals be zero, we can guarantee the statistical independence of signals.

2.2. New performance criterion Since, in practice, it is not possible to estimate the cross-cumulants of the whole orders, we use nonlin-

269

ear functions including them in their Taylor series expansions. Consider an infinitely differentiable function f ( ) ) of s . The Taylor series expansion of  f (s ) about the origin is of the following form:  f (0) f (0) f (s )"f (0)#f (0)s # s# s#2   2!  3!  (8)  f G(0) " sG i!  G

(9)

 _ a sG . G G

(10)

Similarly, we can expand an infinitely differentiable function g( ) ) with respect to s around the  origin to obtain g(s )"  b sH . If we take the  H H  expectation of f (s )g(s ), we have  



  E+ f (s )g(s ),"E a b sG sH   G H G H



(11)

  " a b E(sG sH ). G H  G H

(12)

Product of the expectation of f (s ) and that of g(s )   is obtained as







  E+ f (s ),E+g(s ),"E a sG E b sH   G H G H   " a b E(sG )E(sH ). G H   G H

(13)

(14)

Therefore, cum( f (s ),g(s ))"E+ f (s )g(s ),!E+ f (s ),E+g(s ),         " a b cum(sG ,sH ). G H   G H

(15)

If s and s are independent, cum(sG ,sH ) is zero for     all i,j and these make cum( f (s ),g(s )) be zero.   Therefore cum( f (s ),g(s ))"0 is a necessary condi  tion in order for those s and s to be independent.   For independent random signals s ,s ,2,s , ele  L ments y ,y ,2,y of ½"CX"CAS"¼S, where   L ¼"CA, satisfy the following theorem [9].

270

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

Theorem 1. ¹he following three statements are equivalent: 1. y ,y ,2,y are pair-wisely independent.   L 2. y ,y ,2,y are mutually independent.   L 3. ¼"DP, where D is a diagonal matrix and P is a permutation matrix. If we make all pairs of n signals be independent, then the signals become mutually independent. Based on this and the above necessary condition, we define a new performance criterion [13]. From now on we abbreviate f (y ) and g(y ) as f and g , G H G H respectively, for convenience’s sake. Definition 1. A new performance criterion for separation of sources is defined as J"#M!diag M#, $ where



(16)



cum( f ,g ) cum( f ,g ) 2 cum( f ,g )      L cum( f ,g ) cum( f ,g ) 2 cum( f ,g )      L , M" $ $ \ $ cum( f ,g ) L 



diag M"

M  0

cum( f ,g ) L 

cum( f ,g ) L L (17)

2

0

2

0

2

0

$

M  $

\

$

0

0

2

M LL



L L M , GH G H where # ) # is the Frobenious norm. $



,

#M# " $

2.3. Learning algorithm We reformulate the problem as that of minimizing the performance criterion and will discuss the learning of a separator matrix in order to get a desirable value. Let the separator matrix C be as follows: ½"CX,

(18)

where



1 c

C"

 $

c

 1

2

c

2

c

$

\

L



L . $

c

(19)

c 2 1 L L In order to minimize the performance criterion, the separator matrix C is to be updated by using the steepest descent method as follows: C(k#1)"C(k)#k *C(k),

(20)

*J *C(k)"! . *C(k)

(21)

The performance criterion J is represented by the sum of squares of elements in M except the diagonal terms as follows: L L J" (cum( f ,g )) G H G H$G L L " (E( f g )!E( f )E(g )). (22) G H G H G H$G By differentiating J partially about an element c , we can obtain the following equation: ?@ *J L "2 (E( f g )!E( f )E(g )) ? G ? G *c ?@ G$? ) (E( f x g )!E( f x )E(g )) ? @ G ? @ G L #2 (E(g f )!E(g )E( f )) ? G ? G G$? ) (E(gx f )!E(gx )E( f )) ? @ G ? @ G L "2 [cum( f ,g ) ) cum( f x ,g ) ? G ? @ G G$? #cum(g , f ) ) cum(gx , f )]. (23) ? G ? @ G That is, the partial derivatives can be defined as *J _H (C,X,½). ?@ *c ?@

(24)

Example 1. Let the matrices and functions be



A"



1

0.5

0.5

1

,



C"

1

c



c



 , 1

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267—276

271

f (x)"tanh(x) and g(x)"tanh(0.5x). s and s are   random signals whose probability density functions are uniformly distributed on [!1,1]. Fig. 1 shows the shape of the performance criterion J about the axes c and c and trajectories   of those elements which have different initial points are given in Fig. 2. In this example, equilibria of (c ,c ) are (!0.5,!0.5) and (!2,!2) as shown   in the figure. We can know that after sufficient learning, (c ,c ) goes to one of the equilibria.   Fig. 3 illustrates the learning result of CA. In this experiment, we initialize C by the identity matrix I.

2.4. Phantom solutions Fig. 2. Trajectories of elements of C.

If the separator matrix C converges and stays in the steady state, then *C(k) becomes 0 to make cum( f ,g )"0 for iOj. However since G H cum( f ,g )"0 is a necessary condition for indepenG H dence of y and y , the matrix C may have phantom G H solutions. Phantom solutions can make the performance criterion J be zero but can not separate the measured signals to recover the source signals. For example, assume n"2 and f (x)"g(x)"eV!1. After sufficient learning, if cum( f ,g )"0 is made,   then E(eW>W)"E(eW)E(eW) is established. If we denote ¼"CA, then outputs y and  y become w s #w s and w s #w s , re         spectively, which leads to E(eU>UQ)E(eU>UQ)

Fig. 3. Variations of ¼"CA.

"E(eUQ)E(eUQ)E(eUQ)E(eUQ) to result in the following solutions:

  

¼"

Fig. 1. Surface of performance criterion J.

a

0

0

b

,

0

a

b

0

or

  a

b

c

d

.

(25)

The first two of the above solutions are desirable to recover the source signals, but the last one cannot separate the measured signals, though it makes the performance criterion J have a minimum value. The latter case happens in case of source signals whose probability density functions are impulsive.

272

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

When more than one Gaussian source exist, we may obtain extra solutions that make the cost function be minimum and the outputs be independent. The proposed cost function has the following physical meaning for independence: cum(yN,yO)"0, G H iOj, i,j"1,2,2,n, p,q"1,2,3,2 .

(26)

In case of Gaussian sources, it is sufficient that cum(y ,y )"0, iOj. ½"+y ,y ,2,y , has eleG H   L ments which consist of linear combinations of Gaussian signals, therefore the components are also Gaussian. In this case, if R "E(½½2) is a diW agonal matrix, y , i"1,2,2,m, are independent. G R "CAR A2C2 where, without loss of generality, W Q we can assume that R "I is a diagonal matrix and Q this leads to CA"DG where G is a general orthogonal matrix [4]. This means that by decorrelating the output, we can obtain the independent components from the mixture of Gaussian sources, however they may not be the same as the original sources. Only if G is a permutation matrix, the sources can be separated.

3. Stability analysis The ordinary differential equation (ODE) method is useful to study the asymptotic behaviour of an algorithm such as

For n"m"2, there are two separating points of C. The first point is called a natural separating point C because it restores s and s in the natural    order. The second is a reverse separating point C implying the reverse order.  If we denote h"[c ,c ]2, then from Eqs. (20)   and (23) C at C can be described by G *J *J *c *c *c    C(C )" , i"1,2, (29) G *J *J



*c *c   a b " . b c

 

*c 



!!G (30)

For stability, C(C ) needs to have eigenvalues G which have positive real parts, and this can be checked easily. j(C(C ))s are roots of the following G polynomial equation: j!(a#c)j#ac!b"0.

(31)

We obtain two eigenvalues (a#c)$((a!c)#4b . j "   2 To satisfy the positivity of real parts, the following two conditions are sufficient: a#c'0,

(32)

ac!b'0.

(33)

h "h !k t(h ,f ), (27) L> L L L L where f is a stationary sequence of random variL ables, and k is a sequence of positive numbers [2]. L Since our algorithm has the above form, that is, we can replace h, t and f by C, H and +X,½,, respectively, we investigate the stability using the ODE method in the case of n"m"2. A stationary point h verifies Et(h ,f)"0 and is H H said to be stable if all the eigenvalues of a matrix C defined as



C"



*Et(h,f) *h

(28) FFH have positive real parts. Since the separating points are equilibria of algorithm (20), h is one of the separating points of C. H

Fig. 4. Variations of ¼"CA in case of mixture of sub- and super-Gaussian signals.

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267—276

Fig. 5. Separation result of train horn sound and bird chirp sound.

273

274

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

The first condition is satisfied since C s are local G minima. In the natural separating point C , the  outputs y and y are scaled s and s , respectively,     i.e., y " s for i"1,2 for some scaling factors G  G G

. From this fact and the zero-mean assump G tions, a, b and c are written as

higher-order cases for n,m'2 for further works to the readers.

a"2a [E( f )E(g s )#E(g)E( f s )],     

In the first computer simulation, we applied the algorithm to mixtures of a sub-Gaussian signal and a super-Gaussian signal with the mixing matrix

(34)

b"2a a [E( f s )E(g s )E( f )E(g)       #E(g s )E( f s )E(g)E( f )],     c"2a [E( f )E(g s )#E(g)E( f s )],     

(35) (36)

where f and g denote f ( s ) and g( s ), respecG G  G G  G G tively, for i"1,2. Through some manipulations, we can obtain the following relation: ac!b"a a (º !» )*0,    

(37)

where º "E( f f )E(g s g s ) and » "       E(gg)E( f s f s ).     Similarly, for the reverse separating point C , we  can obtain with y " s ,y " s ,         ac!b"a a (º !» )*0,    

(38)

where º "E( f f )E(g s g s ) and » "       E(gg)E( f s f s ). Therefore, the algorithm is lo    cally stable in the vicinity of the separating points unless º "» such as f ( ) )"gg( ) ) where g is conG G stant. Some separating functions deserve comments. If f (x)"x and g(x)"x as many algorithms, º "  9   E(s)E(s) and » "   E(s)E(s).              This leads to the condition E(s)E(s)O   9E(s)E(s) for stability. This relation holds for   the reverse separating point C . Therefore, when we  use a cubic and a linear function for the separating functions, stable separation can never be met if there are more than one Gaussian sources. Note that our condition is weaker than requirements of other researches [1,6,16,17], where the stable conditions are described in terms of inequality. If we employ the different nonlinear functions as the separating functions, Eq. (33) holds in the neighbourhood of C and C . Therefore, the algorithm with   the initial points near C or C converges to C or    C , respectively. We left the stability analysis of the 

4. Computer simulation results



A"



1

0.31

0.72

1

.

(39)

The sub-Gaussian signal (_s ) is uniformly dis tributed on [!1,1] (c "E(s)!3E(s)"    Q !0.8(0) and the super-Gaussian signal (_s )  is exponentially distributed with a parameter 1 (c "E(s)!3E(s)"6'0). The separating    Q function f (x) and g(x) are tanh(x), tanh(0.5x), respectively. The step-size k used for the algorithm is 0.08 and C(0), the initial value of C is I. In Fig. 4, we present variations of the product of C and A. For successful separation, CA has a form of product of a diagonal matrix D and a permutation matrix P and from the result, we can know that CA becomes that kind of form. This example is not solvable by the previous algorithms since they have the restriction on these signals, e.g., some techniques are applicable to either the sub- or superGaussian signals and some methods are suitable to only signals whose sum of kurtosis have negative

Fig. 6. Variations of ¼"CA.

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267—276

value. Fig. 4 proves the validity of the proposed algorithm. In the second and the third experiments, the algorithm is employed to the mixtures of two sampled signals and three sampled signals, respectively. The mixing matrix A has 1 as diagonal elements and 0.5 as off-diagonal components and the other settings ( f (x), g(x), k and C(0)) are the same as in the first simulation.

275

Fig. 5 shows the successful separation result of a train horn sound and a bird chirp sound. Fig. 6 represents that the algorithm makes the separating matrix C be trained such that CA has the negligible off-diagonal elements in comparison with the diagonal components. Figs. 7 and 8 depict the separation results of three arbitrary source signals and learning trajectories of elements of the matrix CA.

Fig. 7. Separation result of three source signals.

276

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

References

Fig. 8. Variations of ¼"CA with three source signals.

5. Conclusions In this paper, we proposed the new blind separation algorithm in order to apply it to the various source signals. It seems to be so difficult to solve this problem since the characteristics of the sources and channels are not known. The assumption we have made is only that the source signals are mutually independent and the channel modeling matrix is nonsingular. So far, many researchers have shown the remarkable results on this problem, but their results had limitations that they were not applicable to some sorts of signals. For example, either sub- or super-Gaussian sources can be separable. However, we used the cumulants and the nonlinear functions so that the algorithm based on the newly proposed cost function had little restriction to various kinds of source signals. As shown in the computer simulations, the mixtures of the arbitrary signals could be separated successfully and especially the case which could not be solved by the previous researches have been resolved. In the future research, the study on the bound of the step size and the steady-state error between CA and DP will be performed. Improvement of the learning speed are currently under investigation and we are expecting further research results on the cases of dependent source signals, nonlinear channels and additive noises.

[1] A.J. Bell, T. Sejnowski, Blind separation and blind deconvolution: An information-theoretic approach, in: Proc. Internat. Conf. Acoust. Speech Signal Process., Detroit, 1995, pp. 3415—3418. [2] A. Benveniste, M. Me´tivier, P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer, Berlin, 1990. [3] D.R. Brillinger, Time Series: Data Analysis and Theory, Holden-Day, San Francisco, CA, 1981. [4] X.-R. Cao, R.-W. Liu, General approach to blind source separation, IEEE Trans. Signal Process. 44 (3) (March 1996) 562—571. [5] J.F. Cardoso, Source separation using higher-order moment, in: Proc. Internat. Conf. Acoust. Speech Signal Processing, Glasgow, May 1989, pp. 2109—2112. [6] J.-F. Cardoso, B.H. Laheld, Equivariant adaptive source separation, IEEE Trans. Signal Process. 44 (12) (December 1996) 3017—3030. [7] A. Cichocki, L. Moszczynski, New learning algorithm for blind separation of sources, Electron. Lett. 28 (21) (October 1992) 1986—1987. [8] A. Cichocki, R. Unbehauen, E. Rummert, Robust learning algorithm for blind separation of signals, Electron. Lett. 30 (17) (August 1994) 1386—1387. [9] P. Comon, Higher-order separation, application to detection and localization, in: Signal Processing V, Theory and Applications, Proc. Eusipco-90, Barcelona, September 1990, pp. 277—280. [10] P. Comon, Independent component analysis, A new concept?, Signal Processing 36 (3) (April 1994) 287—314. [11] N. Delfosse, P. Loubaton, Adaptive blind separation of independent sources: A deflation approach, Signal Processing 45 (1995) 59—83. [12] S.C. Douglas, A. Cichocki, Neural networks for blind decorrelation of signals, IEEE Trans. Signal Process. 45 (11) (November 1997) 2829—2842. [13] B.C. Ihm, K.-S. Eom, D.J. Park, New blind separation algorithm using cumulant measure, in: Proc. 3rd Internat. Conf. Signal Processing, Beijing, China, October 1996, pp. 477—480. [14] C. Jutten J. He´rault, Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture, Signal Process. 24 (1) (July 1991) 1—10. [15] J. Karhunen, J. Joutsensalo, Representation and separation of signals using nonlinear PCA type learning, Neural Networks 7 (1) (1993) 113—127. [16] O. Macchi, E. Moreau, Self-adaptive source separation, part I: convergence analysis of a direct linear network controlled by the He´rault—Jutten algorithm, IEEE Trans. Signal Process. 45 (4) (April 1997) 918—926. [17] E. Sorouchyari, Blind separation of sources, Part III: Stability analysis, Signal Processing 24 (1) (July 1991) 21—29. [18] L. Tong, R.-W. Liu, V.C. Soon, Y.-F. Huang, Indeterminacy and identifiability of blind identification, IEEE Trans. Circuits and Systems 38 (5) (May 1991) 499—509.