A simplified approach to independent component analysis

Comment

Report 4 Downloads 97 Views

Neural Comput & Applic (2003) 12: 173–177 DOI 10.1007/s00521-003-0379-7

O R I GI N A L A R T IC L E

Yogesh Singh Æ C. S. Rai

A simplified approach to independent component analysis

Received: 28 March 2002 / Accepted: 3 July 2003 / Published online: 25 November 2003 Springer-Verlag London Limited 2003

Abstract Independent Component Analysis (ICA) is one of the fastest growing ﬁelds in the area of neural networks and signal processing. Blind Source Separation (BSS) is one of the applications of ICA. In this paper, ICA has been used for separating unknown source signals. BSS is used to extract independent signal components from their observed linear mixtures at an array of sensors. Various statistical techniques based on information theoretic and algebraic approaches exist for performing ICA. In this paper, we have used an objective function based on independence criterion of the signals. Optimisation of this objective function yields a neural algorithm along with a non-linear function for signal separation. Performance of the algorithm for artiﬁcially generated signals as well as audio signals has been evaluated. Keywords Blind source separation (BSS) Æ Edgeworth expansion Æ Objective function

where xi(t) is the linear mixture observed at the ith sensor, sj(t) is the jth original source signal and ni(t) is the ith additive noise component. To further simplify the problem, it has been assumed that the source signals are zero mean, statistically independent and the number of sources is equal to the number of sensors. Eq. 1 can be represented in matrix form as: xðtÞ ¼ AsðtÞ þ nðtÞ

where x(t)=[x1(t), x2(t), ..., xn(t)]T is the observation vector at the sensor outputs, A is an nn unknown mixing matrix, s(t)=[s1(t), s2(t), ..., sn(t)]T is the original source vector and n(t) = [n1(t), n2(t), ..., nn(t)]T is the noise vector. The eﬀect of the noise has been neglected in the rest of the paper. Figure 1a indicates the neural model of the mixing network.To recover individual components of s, a full rank n·n matrix W is to be iteratively found such that: y ¼ Wx

1 Introduction Independent Component Analysis (ICA) techniques are used to extract independent components from their linear mixtures. Signals originating from unknown sources get mixed with each other before they are received at sensors. Extraction of the original signals takes place only with the help of observations obtained at the sensors. Unsupervised neural algorithms are best suited for implementing these techniques. Following linear model of the source separation has been assumed in this paper: xi ðtÞ ¼

n X

aij sj ðtÞ þ ni ðtÞ;

j¼1

Y. Singh Æ C. S. Rai (&) School of information Technology, G.G.S. Indraprastha University, Kashmere Gate, Delhi 110006, India E-mail: [email protected]

i ¼ 1; 2; ::: m

ð1Þ

ð2Þ

ð3Þ

here y(t)=[y1(t), y2(t), ..., yn(t)]T is the output vector. Ideally W should be equal to A-1, so that y = s. Practically output components are made as independent as possible. Figure 1b indicates separating matrix. Assuming that the original source signals are independent and at the most only one of them is Gaussian, separation of the original source signals from the observation vector can be achieved with uncertainty in scale and permutation [1]. Various algebraic and neural approaches have been proposed for the source separation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. ICA techniques have been used for biomedical signal analysis [13,14], feature extraction [15, 16], telecommunications [17, 18, 19], audio signal separation [3, 4, 5], separation of seismic signals [20], etc. 1. Objective function To extract independent components from the output vector, an objective function is required. Optimisation of the objective function should yield adaptive source separation algorithm. When the components of the

174

n X LðW Þ ¼ logdetðW Þ1 þ logðfx ðxÞÞ logfi ðyi Þ

ð10Þ

i¼1

Because the pdf of the input random vector is independent of the parameter vector W, the objective function, Y(W), can be written as WðW Þ ¼ logjdetðW Þj

n X

logfi ðyi Þ

ð11Þ

i¼1

Fig. 1 a Mixing network and b separating network

output vector become independent, its joint probability density function (pdf) factorises to marginal pdfs as [21]: fðyÞ ¼

n Y

fi ðyi Þ

ð4Þ

i¼1

2 Learning rule

where yi is the ith component of the output vector y. For the model given by Eq. 3 the relationship between pdf of y parameterised by W, f(y,W), and pdf of x, fx(x), can be written as [21] fðy;W Þ ¼

f x ð xÞ jJ j

ð5Þ

where |J| is the determinant of the Jacobean matrix J. It can be deﬁned as @y1 @y1 @y1 ::: @x @x1 @x2 n @y2 @y2 @y2 ::: @x1 @x2 @xn ð6Þ ::: ::: ::: ::: @yn @yn @yn @x @x ::: @x 1

2

n

The ijth component of n·n matrix W in Eq. 3 is wij. From Eq. 3 we can write @yi ¼ wij @xj

ð7Þ

Hence from Eqs (6) and (7), it is evident that J = W. Now Eq. 5 can be expressed as fðy; WÞ ¼

Approximations for the pdf in the following section and omission of the second term from Eq. 10 make the objective function given by Eq. 8 a positive deﬁnite quantity. Hence minimisation of Y(W) should make the output components independent.

fx ð x Þ jW j

ð8Þ

To make output components independent, the natural choice of the objective function will be the diﬀerence between the joint pdf and product of the marginal pdf of the output vector. Because the logarithm provides a monotonic transform and computational simplicity, it is easier to optimise the diﬀerence of the logarithm of the two quantities. The logarithmic diﬀerence L (W) can be written as LðW Þ ¼ logðfðy;W ÞÞ

n X

logðfi ðyi ÞÞ

ð9Þ

i¼1

Substituting the value of f(y,W) from Eq. 8 into Eq. 9, we get

To determine the second term of the objective function given by Eq. 11, probability density function must be expressed in terms of y. Edgeworth expansion is a suitable choice for representing sub-Gaussian (short tail density functions). The ﬁrst four terms of this expansion for a standardised random variable (unit variance) yi can be written as [22] " # ji;3 h3 ji;4 h4 j2i;3 h6 þ þ :::: ð12Þ fi ðyi Þ ¼ aðyi Þ 1 þ 3! 4! 6! where a(ci) is a Gaussian function with zero mean and unit variance 1 y2i 2 pﬃﬃﬃﬃﬃﬃ e 2p ji,r is the rth order cumulant of y i and h r is the rth hermite polynomial. The contribution of the remaining terms of the Eq. 12 is negligible. Third and fourth order cumulants are represented in terms of moments as follows: ji;3 ¼ mi;3

ð13Þ

ji;4 ¼ mi;4 3 m2i;3

ð14Þ

The rth order nmoment of y i can be written as: P r mi;k ¼ E½yi ¼ E ðwik xi Þr , where x i is the ith elei¼1

ment of the observation vector x and wik is the weight vector between the ith component of input vector and kth output component. Taking the logarithm of Eq. 12, we get log fi ðyi Þ ¼ log aðy "i Þ # ji;3 h3 ji;4 h4 j2i;3 h6 þ þ :::: þ log 1 þ 3! 4! 6! ð15Þ

175

Minimisation of the cost function given by Eq. 11 will require maximisation of log(fi(yi)). Using Eqs. 13 and 14, the cumulants in Eq. 15 can be replaced as equivalent moments. Representing these moments in terms of equivalent expectations and then taking the instantaneous value of log(fi(yi)), the following expression is obtained: log fi ðyi Þ ¼ logðai Þ 1 75 39 363 12 233 14 54y4 þ 33y6 þ y8 þ y10 þ y y 72 4 4 16 8 47 23 101 20 5 1 23 y þ y22 y y16 þ y18 16 3 48 24 144 ð16Þ For expanding the second term of Eq. 16, the 2 expansion logð1 þ aÞ ﬃ a a2 has been used. It can be easily veriﬁed that maximum of log (fi(yi)) will be approximately determined by the maximum value of the second term of Eq. 16. In other words, the eﬀect of log(ki) is negligible compared to the second term. Hence maximisation of log(fi(yi))/log(ki) will approximately determine the maximum of log(fi(yi)). Eq. 16 can be written as:

11 5 25 7 65 9 119 11 1631 13 y y y y þ y 4 i 12 i 48 i 24 i 288 i 47 23 505 19 55 21 1 23 y y þ y þ y15 y17 þ 72 i 12 i 864 i 864 i 432 i ð20Þ

Uðyi Þ ¼ 3y3i

The stochastic gradient decent algorithm can now be deﬁned as W ðk þ 1Þ ¼ W ðkÞ l

@WðW Þ @W

ð21Þ

where l is the learning constant and k is the time index. Substituting the gradient of the cost function from Eq. 19 into Eq. 21, the following weight update rule is obtained: W ðk þ 1Þ ¼ W ðkÞ þ lðkÞ½W T UðyðkÞÞxT ðkÞ

ð22Þ

Post multiplication of Eq. 22 by WW-T transforms a standard gradient into natural gradient [2]. Due to the Remannian geometry of the weight space, the natural gradient of the objective function provides faster and accurate convergence. The natural gradient algorithm can be expressed as W ðk þ 1Þ ¼ W ðkÞ þ lðkÞ½I UðyðkÞÞyT ðkÞW ðkÞ

ð23Þ

logðfi ðyi ÞÞ 1 75 39 54y 4 þ 33y 6 þ y 8 þ y 10 ¼ logðai Þ 72 4 4 363 12 233 14 47 16 23 18 y y y þ y þ 16 8 16 3 101 20 5 1 23 y þ y 22 y 48 24 144

3 Performance

ð17Þ

Taking derivative of Eq. 17 with respect to wik (weight between xk and yi) yields the following expansion: @

logðfi ðyi ÞÞ logðai Þ

@wik

¼ ½ 3yi3 þ

11 5 25 7 65 9 119 11 y þ y þ y þ y 4 i 12 i 48 i 24 i

1631 13 47 15 23 17 505 19 y yi þ yi y 288 i 72 12 864 i

55 21 1 23 y þ y x 864 i 432 i k

ð18Þ

The gradient descent of Eq. 11 can be written as [2]

@WðW Þ ¼ W T UðyÞxT @W

ð19Þ

where W-T is the inverse transpose of weight matrix W, xT is n·1 transpose of the observation vector and F(y)=[F(y1), F(y2), ... F( yn)] is the component wise nonlinear function. From Eq 18, the value of F( yi) can be written as

Performance of the algorithm given by Eq. 23 has been evaluated using the rejection ratio given in [2]: m X m m X m pij pij X X 1Þ þ 1Þ q¼ ð ð maxjpik j maxjpik j i¼1 j¼1 j¼1 i¼1 k

k

ð24Þ where P={pij}=WA is the system matrix. A low value of q indicates good separation quality. Experimentation was done for the artiﬁcially generated and real world signals. The mixing matrix and initial weight vector were randomly generated. The learning rate l was ﬁxed at 0.01. In the ﬁrst setup, the following artiﬁcially generated simple communication signals (sub-Gaussian) were considered: 1. 2. 3. 4. 5. 6. 7. 8.

u1=.5sin(500 t); u2=sin(1500 t); u3=.2square(300 t); u4=.1sin(700 t). cos(70 t); u5=.03sign(sin(800t+5cos(50 t))); u6=.07sin(300 t). cos(40 t); u7=.15sin(1200 t); u8=1–2rand(1,1001).

A sample set of 1000 points (one second) for each signal was generated. The estimated value of rejection ratio was 3.01. The system matrix converged according

176 Table 1 A system matrix for artiﬁcially generated signals

0.3063 0.0221 -2.5762 0.0046 0.0027 -0.0193 -0.0053 -0.2321

0.3785 -0.0888 0.0133 -0.0055 0.1136 6.0143 -0.0111 0.1505

1.6915 0.0310 0.0786 0.0656 -0.0038 -0.0020 -0.1909 -13.844

to Table 1. Only one signiﬁcant value in each row and column indicates good separation quality. Finally, the simulations were done with the audio signals. In general, real world signals can have either super- or sub-Gaussian characteristics. Kurtosis can be used as a measure of sub- or super-Gaussianity of a signal. Positive or negative signs of the kurtosis indicate super-Gaussian or sub-Gaussian probability density functions, respectively. Online estimation of the normalised kurtosis of the ith output, K4(yi ), can be done as follows:

E y4i K4 ðyi Þ ¼ 2 3 ð25Þ E y2i The following non-linear function has been used for separating super-Gaussian signals [3]:

Fig. 2 a Original signals b Mixed signals and c Separated signals

.2222 1.2535 0.2912 -0.1094 -40.194 -1.1395 0.0826 0.3145

-0.0379 1.7066 -0.0003 -0.0002 -0.0375 0.0349 -0.0105 -0.0084

Uðyi Þ ¼ yi þ tanhðyi Þ

-20.773 -0.0279 0.1426 -0.0467 -0.1074 0.1291 -0.0660 2.3485

0.1059 -0.0023 0.0273 -8.5815 0.0093 -0.0200 0.0637 -0.5640

-0.0124 0.0014 -0.0006 -0.0092 0.0011 -0.0014 1.3536 -0.0006

ð26Þ

Depending upon the sign of the kurtosis given by Eq. 25, one of the nonlinearities given by Eqs. 20 or 26 is selected [4]. Three audio signals available with matlab (gong.mat, chirp.mat and train.mat) were used for experimentation. Algorithm converged in approximately 3000 iterations. The resultant value of rejection ratio, m, evaluated from ﬁnal system matrix was 0.335. Original, mixed, and separated signals are shown in Fig. 2.

4 Conclusions In this paper, an ICA technique for signal separation was presented. Optimisation of the logarithmic diﬀer-

177

ence between joint pdf and product of marginal pdfs yields an iterative algorithm. The performance of iterative ICA techniques is dependent on the speciﬁc nonlinearity used. In this work, a new nonlinearity has been proposed and experimentally validated. The nonlinearity proposed is best suited for sub-Gaussian signals. Since audio signals can also be super-Gaussian in nature, diﬀerent nonlinearity has been used for them. Experimental results indicate the suitability of the approach for separation of artiﬁcially generated as well as real world signals.

References 1. Comon P (1994) Independent component analysis, a new concept? Signal Process 36:287–314 2. Amari SI, Cichoki A, Yang H (1996) A new learning algorithm for blind signal separation. Advances in Neural Information Processing Systems, Vol 8, pp 757–763, Denver, CO 3. Bell AJ, Sejnowski TJ (1995) An information–maximization approach to blind separation and blind deconvolution. Neural Computation 7:1004–1034 4. Lee T-W, Girlomi M, Sejnowski TJ (1999) Independent component analysis using an extended infomax algorithm for mixed sub-Gaussian and super-Gaussian sources. Neural Computation 11(2):409–433 5. Singh Y, Rai CS (2002) Blind source separation: an uniﬁed approach. Neurocomputing 49(1–4):435–438 6. Singh Y, Rai CS (2001) A non-linear function for gradient based BSS. Proc IJCNN 2001, pp 936–939, Washington DC 7. Cardoso JF (1998) Super-symmetric decomposition of the fourth order cumulant tensor. Proc ICASSP, pp 3109–3112 8. Cardoso JF, Laheld BH (1996) Equivariant adaptive source separation. IEEE Trans Signal Process 44(12):3017–3030 9. Cardoso JF (1998) Blind signal separation: statistical principal. Proc IEEE 9(10):2009–2025

10. Haverinen A, Oja E (1997) A fast ﬁxed point algorithm for independent component analysis. Neural Computation 9:1483– 1492 11. Karhunan J, Oja E, Wang L, Vigario R, Joutsensalo J (1997) A class of neural networks for independent component analysis. IEEE Trans Neural Networks 8:487–504 12. Cichoki A, Unbehauen U, Rummert E (1994) Robust learning algorithm for blind separation of signals. Electr Lett 30(17):1386–1387 13. De.Lathauwer L, De.Moor B, Vandewalle J (1995) Fetal electro-cardiogram extraction by source sub-space separation. Proc HOS95, Spain, pp 134–138 14. Makeig S, Bell A, Jung TP, Sejnowskim TJ (1995) Independent component analysis of electroencephalographic data. Advances in Neural Information Processing Systems, vol 8, MIT Press 15. Ziegaus C, Lang EW (1999) Independent component extraction of natural images based on fourth-order cumulants. ICA99, Aussois, France, pp 115–120 16. Bell AJ, Sejnowski TJ (1996) Edges are the independent components of natural scenes. Advances in Neural Information Processing Systems, vol 9, MIT Press 17. Chaumette E, Common P, Muller D (1993) ‘‘CA-based technique for radiating sources estimation: application to airport survellience. IEE Proc-F 140(6):395–402 18. Swindlehurst A, Goris M, Otterson B (1997) Some experiments with array data collected in actual urban and sub-urban environment. IEEE Workshop on Signal Processing Advances in Wireless Communication, pp 301–304 19. Anand K, Mathew G, Reddy G (1995) Blind separation of multiple co-channel BPSK signals arriving at an antenna array. IEEE Signal Process Lett 2(9):176–178 20. Thirion N, Mars J, Boelle JL (1996) Separation of seismic signals: a new concept based on blind algorithm. Signal Processing VIII, Theories and Applications, Triest, Italy, pp 85–88 21. Papoulis A (1991) Probability, random variables, and stochastic processes. McGraw-Hill 22. Stuart A, Ord JK (1991) Kendals advanced theory of stistics. Edword Arnold

Recommend Documents

Constrained Independent Component Analysis

Applying Independent Component Analysis to ... - Semantic Scholar

Independent Component Analysis: A flexible nonlinearity and ...

Heavy-tailed Independent Component Analysis