Neural Comput & Applic (2003) 12: 173–177 DOI 10.1007/s00521-003-0379-7
O R I GI N A L A R T IC L E
Yogesh Singh Æ C. S. Rai
A simplified approach to independent component analysis
Received: 28 March 2002 / Accepted: 3 July 2003 / Published online: 25 November 2003 Springer-Verlag London Limited 2003
Abstract Independent Component Analysis (ICA) is one of the fastest growing fields in the area of neural networks and signal processing. Blind Source Separation (BSS) is one of the applications of ICA. In this paper, ICA has been used for separating unknown source signals. BSS is used to extract independent signal components from their observed linear mixtures at an array of sensors. Various statistical techniques based on information theoretic and algebraic approaches exist for performing ICA. In this paper, we have used an objective function based on independence criterion of the signals. Optimisation of this objective function yields a neural algorithm along with a non-linear function for signal separation. Performance of the algorithm for artificially generated signals as well as audio signals has been evaluated. Keywords Blind source separation (BSS) Æ Edgeworth expansion Æ Objective function
where xi(t) is the linear mixture observed at the ith sensor, sj(t) is the jth original source signal and ni(t) is the ith additive noise component. To further simplify the problem, it has been assumed that the source signals are zero mean, statistically independent and the number of sources is equal to the number of sensors. Eq. 1 can be represented in matrix form as: xðtÞ ¼ AsðtÞ þ nðtÞ
where x(t)=[x1(t), x2(t), ..., xn(t)]T is the observation vector at the sensor outputs, A is an nn unknown mixing matrix, s(t)=[s1(t), s2(t), ..., sn(t)]T is the original source vector and n(t) = [n1(t), n2(t), ..., nn(t)]T is the noise vector. The effect of the noise has been neglected in the rest of the paper. Figure 1a indicates the neural model of the mixing network.To recover individual components of s, a full rank n·n matrix W is to be iteratively found such that: y ¼ Wx
1 Introduction Independent Component Analysis (ICA) techniques are used to extract independent components from their linear mixtures. Signals originating from unknown sources get mixed with each other before they are received at sensors. Extraction of the original signals takes place only with the help of observations obtained at the sensors. Unsupervised neural algorithms are best suited for implementing these techniques. Following linear model of the source separation has been assumed in this paper: xi ðtÞ ¼
n X
aij sj ðtÞ þ ni ðtÞ;
j¼1
Y. Singh Æ C. S. Rai (&) School of information Technology, G.G.S. Indraprastha University, Kashmere Gate, Delhi 110006, India E-mail:
[email protected] i ¼ 1; 2; ::: m
ð1Þ
ð2Þ
ð3Þ
here y(t)=[y1(t), y2(t), ..., yn(t)]T is the output vector. Ideally W should be equal to A-1, so that y = s. Practically output components are made as independent as possible. Figure 1b indicates separating matrix. Assuming that the original source signals are independent and at the most only one of them is Gaussian, separation of the original source signals from the observation vector can be achieved with uncertainty in scale and permutation [1]. Various algebraic and neural approaches have been proposed for the source separation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. ICA techniques have been used for biomedical signal analysis [13,14], feature extraction [15, 16], telecommunications [17, 18, 19], audio signal separation [3, 4, 5], separation of seismic signals [20], etc. 1. Objective function To extract independent components from the output vector, an objective function is required. Optimisation of the objective function should yield adaptive source separation algorithm. When the components of the
174
n X LðW Þ ¼ logdetðW Þ1 þ logðfx ðxÞÞ logfi ðyi Þ
ð10Þ
i¼1
Because the pdf of the input random vector is independent of the parameter vector W, the objective function, Y(W), can be written as WðW Þ ¼ logjdetðW Þj
n X
logfi ðyi Þ
ð11Þ
i¼1
Fig. 1 a Mixing network and b separating network
output vector become independent, its joint probability density function (pdf) factorises to marginal pdfs as [21]: fðyÞ ¼
n Y
fi ðyi Þ
ð4Þ
i¼1
2 Learning rule
where yi is the ith component of the output vector y. For the model given by Eq. 3 the relationship between pdf of y parameterised by W, f(y,W), and pdf of x, fx(x), can be written as [21] fðy;W Þ ¼
f x ð xÞ jJ j
ð5Þ
where |J| is the determinant of the Jacobean matrix J. It can be defined as @y1 @y1 @y1 ::: @x @x1 @x2 n @y2 @y2 @y2 ::: @x1 @x2 @xn ð6Þ ::: ::: ::: ::: @yn @yn @yn @x @x ::: @x 1
2
n
The ijth component of n·n matrix W in Eq. 3 is wij. From Eq. 3 we can write @yi ¼ wij @xj
ð7Þ
Hence from Eqs (6) and (7), it is evident that J = W. Now Eq. 5 can be expressed as fðy; WÞ ¼
Approximations for the pdf in the following section and omission of the second term from Eq. 10 make the objective function given by Eq. 8 a positive definite quantity. Hence minimisation of Y(W) should make the output components independent.
fx ð x Þ jW j
ð8Þ
To make output components independent, the natural choice of the objective function will be the difference between the joint pdf and product of the marginal pdf of the output vector. Because the logarithm provides a monotonic transform and computational simplicity, it is easier to optimise the difference of the logarithm of the two quantities. The logarithmic difference L (W) can be written as LðW Þ ¼ logðfðy;W ÞÞ
n X
logðfi ðyi ÞÞ
ð9Þ
i¼1
Substituting the value of f(y,W) from Eq. 8 into Eq. 9, we get
To determine the second term of the objective function given by Eq. 11, probability density function must be expressed in terms of y. Edgeworth expansion is a suitable choice for representing sub-Gaussian (short tail density functions). The first four terms of this expansion for a standardised random variable (unit variance) yi can be written as [22] " # ji;3 h3 ji;4 h4 j2i;3 h6 þ þ :::: ð12Þ fi ðyi Þ ¼ aðyi Þ 1 þ 3! 4! 6! where a(ci) is a Gaussian function with zero mean and unit variance 1 y2i 2 pffiffiffiffiffiffi e 2p ji,r is the rth order cumulant of y i and h r is the rth hermite polynomial. The contribution of the remaining terms of the Eq. 12 is negligible. Third and fourth order cumulants are represented in terms of moments as follows: ji;3 ¼ mi;3
ð13Þ
ji;4 ¼ mi;4 3 m2i;3
ð14Þ
The rth order nmoment of y i can be written as: P r mi;k ¼ E½yi ¼ E ðwik xi Þr , where x i is the ith elei¼1
ment of the observation vector x and wik is the weight vector between the ith component of input vector and kth output component. Taking the logarithm of Eq. 12, we get log fi ðyi Þ ¼ log aðy "i Þ # ji;3 h3 ji;4 h4 j2i;3 h6 þ þ :::: þ log 1 þ 3! 4! 6! ð15Þ
175
Minimisation of the cost function given by Eq. 11 will require maximisation of log(fi(yi)). Using Eqs. 13 and 14, the cumulants in Eq. 15 can be replaced as equivalent moments. Representing these moments in terms of equivalent expectations and then taking the instantaneous value of log(fi(yi)), the following expression is obtained: log fi ðyi Þ ¼ logðai Þ 1 75 39 363 12 233 14 54y4 þ 33y6 þ y8 þ y10 þ y y 72 4 4 16 8 47 23 101 20 5 1 23 y þ y22 y y16 þ y18 16 3 48 24 144 ð16Þ For expanding the second term of Eq. 16, the 2 expansion logð1 þ aÞ ffi a a2 has been used. It can be easily verified that maximum of log (fi(yi)) will be approximately determined by the maximum value of the second term of Eq. 16. In other words, the effect of log(ki) is negligible compared to the second term. Hence maximisation of log(fi(yi))/log(ki) will approximately determine the maximum of log(fi(yi)). Eq. 16 can be written as:
11 5 25 7 65 9 119 11 1631 13 y y y y þ y 4 i 12 i 48 i 24 i 288 i 47 23 505 19 55 21 1 23 y y þ y þ y15 y17 þ 72 i 12 i 864 i 864 i 432 i ð20Þ
Uðyi Þ ¼ 3y3i
The stochastic gradient decent algorithm can now be defined as W ðk þ 1Þ ¼ W ðkÞ l
@WðW Þ @W
ð21Þ
where l is the learning constant and k is the time index. Substituting the gradient of the cost function from Eq. 19 into Eq. 21, the following weight update rule is obtained: W ðk þ 1Þ ¼ W ðkÞ þ lðkÞ½W T UðyðkÞÞxT ðkÞ
ð22Þ
Post multiplication of Eq. 22 by WW-T transforms a standard gradient into natural gradient [2]. Due to the Remannian geometry of the weight space, the natural gradient of the objective function provides faster and accurate convergence. The natural gradient algorithm can be expressed as W ðk þ 1Þ ¼ W ðkÞ þ lðkÞ½I UðyðkÞÞyT ðkÞW ðkÞ
ð23Þ
logðfi ðyi ÞÞ 1 75 39 54y 4 þ 33y 6 þ y 8 þ y 10 ¼ logðai Þ 72 4 4 363 12 233 14 47 16 23 18 y y y þ y þ 16 8 16 3 101 20 5 1 23 y þ y 22 y 48 24 144
3 Performance
ð17Þ
Taking derivative of Eq. 17 with respect to wik (weight between xk and yi) yields the following expansion: @
logðfi ðyi ÞÞ logðai Þ
@wik
¼ ½ 3yi3 þ
11 5 25 7 65 9 119 11 y þ y þ y þ y 4 i 12 i 48 i 24 i
1631 13 47 15 23 17 505 19 y yi þ yi y 288 i 72 12 864 i
55 21 1 23 y þ y x 864 i 432 i k
ð18Þ
The gradient descent of Eq. 11 can be written as [2]
@WðW Þ ¼ W T UðyÞxT @W
ð19Þ
where W-T is the inverse transpose of weight matrix W, xT is n·1 transpose of the observation vector and F(y)=[F(y1), F(y2), ... F( yn)] is the component wise nonlinear function. From Eq 18, the value of F( yi) can be written as
Performance of the algorithm given by Eq. 23 has been evaluated using the rejection ratio given in [2]: m X m m X m pij pij X X 1Þ þ 1Þ q¼ ð ð maxjpik j maxjpik j i¼1 j¼1 j¼1 i¼1 k
k
ð24Þ where P={pij}=WA is the system matrix. A low value of q indicates good separation quality. Experimentation was done for the artificially generated and real world signals. The mixing matrix and initial weight vector were randomly generated. The learning rate l was fixed at 0.01. In the first setup, the following artificially generated simple communication signals (sub-Gaussian) were considered: 1. 2. 3. 4. 5. 6. 7. 8.
u1=.5sin(500 t); u2=sin(1500 t); u3=.2square(300 t); u4=.1sin(700 t). cos(70 t); u5=.03sign(sin(800t+5cos(50 t))); u6=.07sin(300 t). cos(40 t); u7=.15sin(1200 t); u8=1–2rand(1,1001).
A sample set of 1000 points (one second) for each signal was generated. The estimated value of rejection ratio was 3.01. The system matrix converged according
176 Table 1 A system matrix for artificially generated signals
0.3063 0.0221 -2.5762 0.0046 0.0027 -0.0193 -0.0053 -0.2321
0.3785 -0.0888 0.0133 -0.0055 0.1136 6.0143 -0.0111 0.1505
1.6915 0.0310 0.0786 0.0656 -0.0038 -0.0020 -0.1909 -13.844
to Table 1. Only one significant value in each row and column indicates good separation quality. Finally, the simulations were done with the audio signals. In general, real world signals can have either super- or sub-Gaussian characteristics. Kurtosis can be used as a measure of sub- or super-Gaussianity of a signal. Positive or negative signs of the kurtosis indicate super-Gaussian or sub-Gaussian probability density functions, respectively. Online estimation of the normalised kurtosis of the ith output, K4(yi ), can be done as follows:
E y4i K4 ðyi Þ ¼ 2 3 ð25Þ E y2i The following non-linear function has been used for separating super-Gaussian signals [3]:
Fig. 2 a Original signals b Mixed signals and c Separated signals
.2222 1.2535 0.2912 -0.1094 -40.194 -1.1395 0.0826 0.3145
-0.0379 1.7066 -0.0003 -0.0002 -0.0375 0.0349 -0.0105 -0.0084
Uðyi Þ ¼ yi þ tanhðyi Þ
-20.773 -0.0279 0.1426 -0.0467 -0.1074 0.1291 -0.0660 2.3485
0.1059 -0.0023 0.0273 -8.5815 0.0093 -0.0200 0.0637 -0.5640
-0.0124 0.0014 -0.0006 -0.0092 0.0011 -0.0014 1.3536 -0.0006
ð26Þ
Depending upon the sign of the kurtosis given by Eq. 25, one of the nonlinearities given by Eqs. 20 or 26 is selected [4]. Three audio signals available with matlab (gong.mat, chirp.mat and train.mat) were used for experimentation. Algorithm converged in approximately 3000 iterations. The resultant value of rejection ratio, m, evaluated from final system matrix was 0.335. Original, mixed, and separated signals are shown in Fig. 2.
4 Conclusions In this paper, an ICA technique for signal separation was presented. Optimisation of the logarithmic differ-
177
ence between joint pdf and product of marginal pdfs yields an iterative algorithm. The performance of iterative ICA techniques is dependent on the specific nonlinearity used. In this work, a new nonlinearity has been proposed and experimentally validated. The nonlinearity proposed is best suited for sub-Gaussian signals. Since audio signals can also be super-Gaussian in nature, different nonlinearity has been used for them. Experimental results indicate the suitability of the approach for separation of artificially generated as well as real world signals.
References 1. Comon P (1994) Independent component analysis, a new concept? Signal Process 36:287–314 2. Amari SI, Cichoki A, Yang H (1996) A new learning algorithm for blind signal separation. Advances in Neural Information Processing Systems, Vol 8, pp 757–763, Denver, CO 3. Bell AJ, Sejnowski TJ (1995) An information–maximization approach to blind separation and blind deconvolution. Neural Computation 7:1004–1034 4. Lee T-W, Girlomi M, Sejnowski TJ (1999) Independent component analysis using an extended infomax algorithm for mixed sub-Gaussian and super-Gaussian sources. Neural Computation 11(2):409–433 5. Singh Y, Rai CS (2002) Blind source separation: an unified approach. Neurocomputing 49(1–4):435–438 6. Singh Y, Rai CS (2001) A non-linear function for gradient based BSS. Proc IJCNN 2001, pp 936–939, Washington DC 7. Cardoso JF (1998) Super-symmetric decomposition of the fourth order cumulant tensor. Proc ICASSP, pp 3109–3112 8. Cardoso JF, Laheld BH (1996) Equivariant adaptive source separation. IEEE Trans Signal Process 44(12):3017–3030 9. Cardoso JF (1998) Blind signal separation: statistical principal. Proc IEEE 9(10):2009–2025
10. Haverinen A, Oja E (1997) A fast fixed point algorithm for independent component analysis. Neural Computation 9:1483– 1492 11. Karhunan J, Oja E, Wang L, Vigario R, Joutsensalo J (1997) A class of neural networks for independent component analysis. IEEE Trans Neural Networks 8:487–504 12. Cichoki A, Unbehauen U, Rummert E (1994) Robust learning algorithm for blind separation of signals. Electr Lett 30(17):1386–1387 13. De.Lathauwer L, De.Moor B, Vandewalle J (1995) Fetal electro-cardiogram extraction by source sub-space separation. Proc HOS95, Spain, pp 134–138 14. Makeig S, Bell A, Jung TP, Sejnowskim TJ (1995) Independent component analysis of electroencephalographic data. Advances in Neural Information Processing Systems, vol 8, MIT Press 15. Ziegaus C, Lang EW (1999) Independent component extraction of natural images based on fourth-order cumulants. ICA99, Aussois, France, pp 115–120 16. Bell AJ, Sejnowski TJ (1996) Edges are the independent components of natural scenes. Advances in Neural Information Processing Systems, vol 9, MIT Press 17. Chaumette E, Common P, Muller D (1993) ‘‘CA-based technique for radiating sources estimation: application to airport survellience. IEE Proc-F 140(6):395–402 18. Swindlehurst A, Goris M, Otterson B (1997) Some experiments with array data collected in actual urban and sub-urban environment. IEEE Workshop on Signal Processing Advances in Wireless Communication, pp 301–304 19. Anand K, Mathew G, Reddy G (1995) Blind separation of multiple co-channel BPSK signals arriving at an antenna array. IEEE Signal Process Lett 2(9):176–178 20. Thirion N, Mars J, Boelle JL (1996) Separation of seismic signals: a new concept based on blind algorithm. Signal Processing VIII, Theories and Applications, Triest, Italy, pp 85–88 21. Papoulis A (1991) Probability, random variables, and stochastic processes. McGraw-Hill 22. Stuart A, Ord JK (1991) Kendals advanced theory of stistics. Edword Arnold