Signal Processing 84 (2004) 951 – 956
www.elsevier.com/locate/sigpro
Fast communication
Uniqueness of complex and multidimensional independent component analysis F.J. Theis∗ Institute of Biophysics, University of Regensburg, Universitaetsstr. 31, D93040 Regensburg, Germany Received 25 September 2003
Abstract A complex version of the Darmois–Skitovitch theorem is proved using a multivariate extension of the latter by Ghurye and Olkin. This makes it possible to calculate the indeterminacies of independent component analysis (ICA) with complex variables and coe4cients. Furthermore, the multivariate Darmois–Skitovitch theorem is used to show uniqueness of multidimensional ICA, where only groups of sources are mutually independent. ? 2004 Elsevier B.V. All rights reserved. PACS: 84.40.Ua; 89.70.+c; 07.05.Kf Keywords: Complex ICA; Multidimensional ICA; Separability
1. Introduction The task of independent component analysis (ICA) is to transform a given random vector into a statistically independent one. ICA can be applied to blind source separation (BSS), where it is furthermore assumed that the given vector has been mixed using a ?xed set of independent sources. Good textbook-level introductions to ICA are given in [4,11]. BSS is said to be separable if the mixing structure can be blindly recovered except for obvious indeterminacies. In [5], Comon shows separability of linear real BSS using the Skitovitch–Darmois theorem. He notes that his proof for the real case can also be extended to the complex setting. However, a complex version of ∗
Tel.: +49-941-9432924; fax: +49-941-9432479. E-mail addresses:
[email protected],
[email protected] (F.J. Theis).
the Skitovitch–Darmois theorem is needed, which, to the knowledge of the author, has not been shown in the literature, yet. In this work we will provide such a theorem, which is then used to prove separability of complex BSS. Separability and uniqueness of BSS is already included in the de?nition of what is commonly called a ‘contrast’ [5]. Hence it has been widely studied, but in the setting of complex BSS to the knowledge of the author separability has only been shown under the additional assumption of non-zero cumulants of the sources [5,13]. The paper is organized as follows: In the next section, basic terms and notations are introduced. Section 3 states the well-known Skitovitch–Darmois theorem and a multivariate extension thereof; furthermore, a complex version of it is derived. The following Section 4 then introduces the complex linear blind source separation model and shows its separability.
0165-1684/$ - see front matter ? 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2004.01.008
952
F.J. Theis / Signal Processing 84 (2004) 951 – 956
Section 5 ?nally deals with separability of multidimensional ICA (group ICA). 2. Notation Let K ∈ {R; C} be either the real or the complex numbers. For m; n ∈ N let Mat(m × n; K) be the K-vectorspace of real, respectively, complex m × n matrices, Gl(n; K) := {W ∈ Mat(n × n; K) | det(W) = 0} be the general linear group of Kn . I ∈ Gl(n; K) denotes the unit matrix. For ∈ C we write Re() for its real and Im() for its imaginary part. An invertible matrix L ∈ Gl(n; K) is said to be a scaling matrix, if it is diagonal. We say two matrices B, C ∈ Mat(m × n; K) are (K−)equivalent, B ∼ C, if C can be written as C = BPL with a scaling matrix L ∈ Gl(n; K) and an invertible matrix with unit vectors in each row (permutation matrix) P ∈ Gl(n; K). Note that PL = L P for some scaling matrix L ∈ Gl(n; K), so the order of the permutation and the scaling matrix does not play a role for equivalence. Furthermore, if B ∈ Gl(n; K) with B ∼ I, then also B−1 ∼ I, and more general if BC ∼ A, then C ∼ B−1 A. So two matrices are equivalent if and only if they diNer by right-multiplication by a matrix with exactly one non-zero entry in each row and each column. If K=R, the two matrices are the same except for permutation, sign and scaling, if K = C, they are the same except for permutation, sign, scaling and phase-shift.
Theorem theorem). Let n 3.1 (Skitovitch–Darmois n L1 = i=1 i Xi and L2 = i=1 i Xi with X1 ; : : : ; Xn independent real random variables and j , j ∈ R for j = 1; : : : ; n. If L1 and L2 are independent, then all Xj with j j = 0 are Gaussian. n The converse is true if we assume that j=1 j j = 0: If all Xj with j j = 0 are Gaussian and n j=1 j j = 0, then L1 and L2 are independent. This follows because then L1 and L2 are uncorrelated, and with all common variables being normal then also independent. Theorem 3.2 (Multivariate Let L1 = n n S–D theorem). i i A X and L = B X with mutually ini 2 i i=1 i=1 dependent k-dimensional random vectors Xj and invertible matrices Aj ; Bj ∈ Gl(k; R) for j = 1; : : : ; n. If L1 and L2 are mutually independent, then all Xj are Gaussian. Here Gaussian (or jointly Gaussian) means that each component of the random vector is a Gaussian. Obviously, those Gaussians can have non-trivial correlations. This extension of Theorem 3.1 to random vectors has ?rst been noted by Skitovitch [15] and shown by Ghurye and Olkin [8]. Zinger gave a diNerent proof for it in his Ph.D. thesis [18]. We need the following corollary:
3. A multivariate version of the Skitovitch– Darmois theorem
n i Corollary 3.3. Let L1 = and L2 = i=1 Ai X n i B X with mutually independent k-dimensional i i=1 random vectors Xj and matrices Aj ; Bj either zero or in Gl(k; R) for j = 1; : : : ; n. If L1 and L2 are mutually independent, then all Xj with Aj Bj = 0 are Gaussian.
The original Skitovitch–Darmois theorem shows a non-trivial connection between Gaussian distributions and stochastic independence. More precisely, it states that if two linear combinations of non-Gaussian independent random variables are again independent, then each original random variable can appear only in one of the two linear combinations. It has been proved independently by Darmois [6] and Skitovitch [14]; in a more accessible form, the proof can be found in [12]. Separability of linear BSS as shown by Comon [5] is a corollary of this theorem, although recently separability has also been shown without it [17].
Proof. We want to throw out all Xj with Aj Bj = 0. Then Theorem 3.2 can be applied. Let j be given with Aj Bj =0. Without loss of generality assume that Bj =0. If also Aj = 0, then we can simply leave out Xj since it does not appear in both L1 and L2 . Assume Aj = 0. By assumption Xj and X1 ; : : : ; Xj−1 ; Xj+1 ; : : : ; Xn are mutually independent, then so are Xj and L2 because Bj = 0. Hence both −Aj Xj , L2 and L1 , L2 are mutually independent, so also the two linear combinations L1 − Aj Xj and L2 of the n − 1 variables X1 ; : : : ; Xj−1 ; Xj+1 ; : : : ; Xn are mutually independent. After successive application of this recursion we can
F.J. Theis / Signal Processing 84 (2004) 951 – 956
953
assume that each Aj and Bj is invertible. Applying Theorem 3.2 shows the corollary.
that A can be found only up to equivalence because for scaling L and permutation matrix P
From this, a complex version of the Skitovitch– Darmois theorem can easily be derived:
X = ALPP−1 L−1 S
Corollary 3.4 (Complex n n S–D theorem). Let L1 = X and L = 2 i=1 i i i=1 i Xi with X1 ; : : : ; Xn independent complex random variables and j ; j ∈ C for j = 1; : : : ; n. If L1 and L2 are independent, then all Xj with j j = 0 are Gaussian. Here, a complex random variable is said to be Gaussian if both real and imaginary part are Gaussians. Proof. We can interpret the n independent complex random variables Xi as n two dimensional real random vectors that are mutually independent. Multiplication by the complex numbers j either (j = 0) is a multiplication by the real invertible matrix Re(j ) −Im(j ) Im(j )
Re(j )
or (j = 0) multiplication by the 0-matrix, similar for j . Applying Corollary 3.3 ?nishes the proof. 4. Indeterminacies of complex ICA Given a complex n-dimensional random vector X, a matrix W ∈ Gl(n; C) is called (complex) ICA of X if WX is independent (as a complex random vector). We will show that W and V are complex ICAs of X if and only if W−1 ∼ V−1 that is if they diNer by right multiplication by a complex scaling and permutation matrix. This is equivalent to calculating the indeterminacies of the complex BSS model: Consider the noiseless complex linear instantaneous blind source separation (BSS) model with as many sources as sensors X = AS:
(1)
Here S is an independent complex-valued ndimensional random vector and A ∈ Gl(n; C) an invertible complex matrix. The task of linear BSS is to ?nd A and S given only X. An obvious indeterminacy of this problem is
and P−1 L−1 S is also independent. We will show that under mild assumptions to S there are no further indeterminacies of complex BSS. Various algorithms for solving the complex BSS problem have been proposed [1,2,7,13,16]. We want to note that many cases where complex BSS is applied can in fact be reduced to using real BSS algorithms. This is the case if either the sources or the mixing matrix are real. The latter for example occurs after Fourier transformation of signals with time structure. If the sources are real, then the above complex model can be split up into two separate real BSS problems: Re(X) = Re(A)S; Im(X) = Im(A)S: Solving both of these real BSS equations yields A = Re(A) + i Im(A). Of course, Re(A) and Im(A) can only be found except for scaling and permutation. By comparing the two recovered source random vectors (using for example mutual information of one component of each vector), we can however assume that the permutation and then also the scaling indeterminacy of both recovered matrices is the same, which allows the algorithm to correctly put A back together. Similarly, also separability of this special complex ICA problem can be derived from the well-known separability results in the real case. If the mixing matrix is known to be real, then again splitting up Eq. (1) into real and complex parts yields Re(X) = A Re(S); Im(X) = A Im(S): A can be found from either equation. If both real and complex samples are to be used in order to increase precision, those can simply be concatenated in order to generate a twice as large sample set mixed by the same mixing matrix A. In terms of random vectors this means working in two disjoint copies of the original probability space. Again separability follows.
954
F.J. Theis / Signal Processing 84 (2004) 951 – 956
Theorem 4.1 (Separability of complex linear BSS). Let A ∈ Gl(n; C) and S a complex independent random vector. Assume one of the following: i. S has at most one Gaussian component and the (complex) covariance of S exists. ii. S has no Gaussian component. If AS is again independent 1 then A is equivalent to the identity. Here, the complex covariance of S is de?ned by Cov(S) = E((S − E(S))(S − E(S))∗ ); where the asterix denotes the transposed and complexly conjugated vector. Comon has shown this for the real case [5]; for the complex case a complex version of the Darmois– Skitovitch theorem is needed as provided in Section 3. Theorem 4.1 indeed proves separability of the complex linear BSS model, because if X = AS and W is a demixing matrix such that WX is independent, then WA ∼ I, so W−1 ∼ A as desired. And it also calculates the indeterminacies of complex ICA, because if W and V are ICAs of X, then both VX and WV−1 VX are independent, so WV−1 ∼ I and hence W ∼ V. Proof. Denote X := AS. First assume case ii: S has no Gaussian component at all. Then A = (aij ) is equal to the identity, because if not there exist i1 = i2 and j with ai1 j ai2 j = 0. Applying Corollary 3.4 to Xi1 and Xi2 then shows that Sj is Gaussian, which is a contradiction to Assumption ii. Now assume that the covariance exists and that S has at most one Gaussian component. First we will show using complex decorrelation that we can assume A to be unitary. Without loss of generality assume that all random vectors are centered. By assumption Cov(X) is diagonal, so let D1 be diagonal invertible with Cov(X) = D21 . Note that D1 is real. Similarly let D2 be diagonal invertible with Cov(S) = D22 . Set −1 Y := D−1 1 X and T := D2 S that is normalize X and S to covariance I. Then −1 −1 Y = D−1 1 X = D1 AS = D1 AD2 T 1 Indeed, we only need that AS are pairwise mutually independent.
and T, D−1 1 AD2 and Y satisfy the assumption and D−1 1 AD2 is unitary because I = Cov(Y) = E(YY∗ ) ∗ ∗ −1 = E(D−1 1 AD2 TT D2 A D1 ) −1 ∗ = (D−1 1 AD2 )(D1 AD2 ) :
If we assume A I, then using the fact that A is unitary there exist indices i1 = i2 and j1 = j2 with ai∗ j∗ = 0. By assumption Xi1 = ai1 j1 Sj1 + ai1 j2 Sj2 + · · · X i 2 = a i 2 j 1 Sj 1 + a i 2 j 2 Sj 2 + · · · are independent, and in both Xi1 and Xi2 the variables Sj1 and Sj2 appear non-trivially, so by the complex Skitovitch–Darmois Theorem 3.4 Sj1 and Sj2 are Gaussian, which is a contradiction to the fact that at most one source is Gaussian. 5. Indeterminacies of multidimensional ICA In this section, we want to analyze indeterminacies of so-called multidimensional independent component analysis. The idea of this generalization of ICA is that we do not require full independence of the transform Y but only mutual independence of certain tuples Yi1 ; : : : ; Yi2 . If the size of all tuples is restricted to one, this reduces to original ICA. In general, of course the tuples could have diNerent sizes, but for the sake of simplicity we assume that they all have the same length (which then necessarily has to divide the total dimension). Multidimensional ICA has ?rst been introduced by Cardoso [3] using geometrical motivations. HyvParinen and Hoyer then presented a special case of multidimensional ICA which they called independent subspace analysis [9]; there the dependence within a k-tuple is explicitly modelled enabling the authors to propose better algorithms without having to resort to the problematic multidimensional density estimation. A diNerent extension of ICA is given by topographic ICA [10], where dependencies between all components are assumed. A special case of multidimensional ICA is complex ICA as presented in
F.J. Theis / Signal Processing 84 (2004) 951 – 956
the preceding section—here dependence is allowed between real-valued couples of random variables. Let k; n ∈ N such that k divides n. We call an n-dimensional random vector Y k-independent if the k-dimensional random vectors Y1 Yn−k+1 .. . . ; : : : ; .. Yk
Yn
are mutually independent. A matrix W ∈ Gl(n; R) is called a k-multidimensional ICA of an n-dimensional random vector X if WX is k-independent. If k = 1, this is the same as ordinary ICA. Obvious indeterminacies are, similar to ordinary ICA, invertible transforms in Gl(k; R) in each tuple as well as the fact that the order of the independent k-tuples is not ?xed. So, de?ne for r; s = 1; : : : ; n=k the (r; s) sub-k-matrix of W = (wij ) to be the k × k submatrix (wij ) i=rk; :::; rk+k−1
j=sk; :::; sk+k−1
that is the k × k submatrix of W starting at position (rk; sk). A matrix L ∈ Gl(n; R) is said to be a k-scaling and permutation matrix if for each r =1; : : : ; n=k there exists precisely one s with the (r; s) sub-k-matrix of L to be nonzero, and such that this submatrix is in Gl(k; R), and if for each s=1; : : : ; n=k there exists only one r with the (r; s) sub-k-matrix satisfying the same condition. Hence, if Y is k-independent, also LY is k-independent. Two matrices A and B are said to be k-equivalent, A ∼k B, if there exists such a k-scaling and permutation matrix L with A=BL. As stated above, given two matrices W and V with W−1 ∼k V−1 such that one of them is a k-multidimensional ICA of a given random vector, then so is the other. We will show that there are no more indeterminacies of multidimensional ICA. As usual multidimensional ICA can solve the multidimensional BSS problem X = AS; where A ∈ Gl(n; R) and S is a k-independent n-dimensional random vector. Finding the indeterminacies of multidimensional ICA then shows that A can be found except for k-equivalence (separability), because if X = AS and W is a demixing matrix
955
such that WX is k-independent, then WA ∼k I, so W−1 ∼k A as desired. However, for the proof we need one more condition for A: We call A k-admissible if for each r; s = 1; : : : ; n=k the (r; s) sub-k-matrix of A is either invertible or zero. Note that this is not a strong restriction— if we randomly choose A with coe4cients out of a continuous distribution, then with probability one we get a k-admissible matrix, because the non-k-admissible 2 matrices ⊂ Rn lie in a submanifold of dimension smaller than n2 . Theorem 5.1 (Separability of multidimensional BSS): Let A ∈ Gl(n; R) and S a k-independent n-dimensional random vector having no Gaussian k-tuple (Srk ; : : : ; Srk+k−1 )T . Assume that A is k-admissible. If AS is again k-independent, then A is k-equivalent to the identity. For the case k = 1 this is linear BSS separability because every matrix is 1-admissible. Proof. Denote X := AS. Assume that A k I. Then there exist indices r1 ; r2 and s such that the (r1 ; s) and the (r2 ; s) sub-k-matrices of A are non-zero (hence in Gl(k; R) by k-admissability). Applying Corollary 3.3 to the two random vectors (Xr1 k ; : : : ; Xr1 k+k−1 )T and (Xr2 k ; : : : ; Xr2 k+k−1 )T then shows that (Ssk ; : : : ; Ssk+k−1 )T is Gaussian, which is a contradiction. Note that we could have used whitening to assume that A is orthogonal; however there does not seem to be a direct way to exploit this in order to allow one fully Gaussian k-tuple, contrary to the complex ICA case, see Theorem 4.1. 6. Conclusion Uniqueness and separability results play a central role in solving BSS problems since they allow algorithms to apply ICA in order to uniquely (except for scaling and permutation) ?nd the original mixing matrices. We have used a multidimensional version of the Skitovitch–Darmois theorem in order to calculate the
956
F.J. Theis / Signal Processing 84 (2004) 951 – 956
indeterminacies of complex and of multidimensional ICA. In the multidimensional ICA case an additional restriction was needed in the proof, which could be relaxed if Corollary 3.3 can be extended to allow matrices of arbitrary rank. Acknowledgements This research was supported by Grants from the DFG 2 and the BMBF 3 . References [1] A. Back, A. Tsoi, Blind deconvolution of signals using a complex recurrent network, in: Neural Networks for Signal Processing 4, Proceedings of the 1994 IEEE Workshop, 1994, pp. 565–574. [2] E. Bingham, A. HyvParinen, A fast ?xed-point algorithm for independent component analysis of complex-valued signals, Internat. J. Neural Systems 10 (1) (2000) 1–8. [3] J. Cardoso, Multidimensional independent component analysis, in: Proceedings of ICASSP ’98, Seattle, WA, May 12–15, 1998. [4] A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing, Wiley, New York, 2002. [5] P. Comon, Independent component analysis—a new concept? Signal Processing 36 (1994) 287–314. [6] G. Darmois, Analyse gTenTerale des liaisons stochastiques, Rev. Inst. Internat. Statist. 21 (1953) 2–8.
2 3
Graduate college ‘nonlinear dynamics’. Project ‘ModKog’.
[7] S. Fiori, Blind separation of circularly distributed sources by neural extended apex algorithm, Neurocomputing 34 (2000) 239–252. [8] S. Ghurye, I. Olkin, A characterization of the multivariate normal distribution, Ann. Math. Statist. 33 (1962) 533–541. [9] A. HyvParinen, P. Hoyer, Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces, Neural Computation 12 (7) (2000) 1705–1720. [10] A. HyvParinen, P. Hoyer, M. Inki, Topographic independent component analysis, Neural Computation 13 (7) (2001) 1525–1558. [11] A. HyvParinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley, New York, 2001. [12] A. Kagan, Y. Linnik, C. Rao, Characterization Problems in Mathematical Statistics, Wiley, New York, 1973. [13] E. Moreau, O. Macchi, Higher order contrasts for self-adaptive source separation, Internat. J. Adaptive Control Signal Process. 10 (1) (1996) 19–46. [14] V. Skitovitch, On a property of the normal distribution, DAN SSSR 89 (1953) 217–219. [15] V. Skitovitch, Linear forms in independent random variables and the normal distribution law, Izvestiia AN SSSR, Ser. Matem. 18 (1954) 185–200. [16] P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing 22 (1998) 21–34. [17] F. Theis, A new concept for separability problems in blind source separation, 2003, submitted for publication; preprint at http://homepages.uni-regensburg.de/ ∼ thf11669/publications/ preprints/theis03linuniqueness.pdf [18] A. Zinger, Investigations into analytical statistics and their application to limit theorems of probability theory, Ph.D. Thesis, Leningrad University, 1969.