Identifiability Conditions and Subspace Clustering in Sparse BSS Pando Georgiev1, Fabian Theis2 , and Anca Ralescu1 1
2
Computer Science Department University of Cincinnati Cincinnati, OH 45221, USA {pgeorgie,aralescu}@ececs.uc.edu Institute of Biophysics, University of Regensburg D-93040 Regensburg, Germany
[email protected] Abstract. We give general identifiability conditions on the source matrix in Blind Signal Separation problem. They refine some previously known ones. We develop a subspace clustering algorithm, which is a generalization of the k-plane clustering algorithm, and is suitable for separation of sparse mixtures with bigger sparsity (i.e. when the number of the sensors is bigger at least by 2 than the number of non-zero elements in most of the columns of the source matrix). We demonstrate our algorithm by examples in the square and underdetermined cases. The latter confirms the new identifiability conditions which require less hyperplanes in the data for full recovery of the sources and the mixing matrix.
1 Introduction A goal of the Blind Signal Separation (BSS) is the recovering of underlying source signals of some given set of observations obtained by an unknown linear mixture of the sources. In order to decompose the data set, different assumptions on the sources have to be made. The most common assumption nowadays is statistical independence of the sources, which leads to the field of Independent Component Analysis (ICA), see for instance [2], [6] and references therein. ICA is very successful in the linear complete case, when as many signals as underlying sources are observed, and the mixing matrix is non-singular. In [3] it is shown that the mixing matrix and the sources are identifiable except for permutation and scaling. In the overcomplete or underdetermined case, less observations than sources are given. It can be seen that still the mixing matrix can be recovered [4], but source identifiability does not hold. In order to approximatively detect the sources, additional requirements have to be made, usually sparsity of the sources. We refer to [7,9,10,11] and reference therein for some recent papers on sparsity and underdetermined ICA (m < n).
2 Blind Source Separation Using Sparseness Definition 1. A vector v ∈ Rm is said to be k-sparse if v has at least k zero entries. A matrix S ∈ Rm×n is said to be k-sparse if each column of it is k-sparse. M.E. Davies et al. (Eds.): ICA 2007, LNCS 4666, pp. 357–364, 2007. c Springer-Verlag Berlin Heidelberg 2007
358
P. Georgiev, F. Theis, and A. Ralescu
The goal of Blind Signal Separation of level k (k-BSS) is to decompose a given m-dimensional random vector X into X = AS
(1)
with a real m × n-matrix A and an n × N -dimensional k-sparse matrix S. S is called the source matrix, X the mixtures and A the mixing matrix. We speak of complete, overcomplete or undercomplete k-BSS if m = n, m < n or m > n respectively. More generally, when a solution to the above problem doesn’t exist, we formulate least square BSS problem as follows: find a best approximation of X by AS, such that S is k-sparse, i.e. minimize|X − AS under constraints A ∈ Rm×n , S ∈ Rn×N and S is k-sparse
(2) (3)
In the following without loss of generality we will assume m n: the undercomplete case can be reduced to the complete case by projection of X.
3 Identifiability Conditions The following identifiability conditions are extension and refinement of those in [5]. The extension is for the case of bigger sparsity i.e. when the source matrix is at least (n − m + 2)-sparse. The refinement can be seen, for instance, when n m + 2 and the source matrix is (n − m + 1)-sparse. The provided examples illustrate both cases. Assume that: (H1) the indeces {1, ..., N } are divided in two groups N1 and N2 such that (a) for any index i ∈ {1, ..., n} there exist ki m different sets of indeces Ii,j ⊂ {1, ..., n}, j = 1, ..., ki each of which contains i, such that |Ii,j | m − 1, (b) for any set Ii,j from (a) there exist mi,j > |Ii,j | vectors s1 , ..., smi,j from the set S1 := {S(:, j), j ∈ N1 }, such that each of them has zero elements in places with indeces not in Ii,j , the rank of the set {s1 , ..., smi,j } is full (i.e. equal to |Ii,j |), and i Ii,j = {i}. (c) kj=1 (H2) Every m vectors from the group {S(:, j), j ∈ N2 } are linearly independent. Theorem 1 (Matrix identifiability). Let the assumptions (H1) and (H2) be satisfied. Then, in the set of all matrices with dimension m × n there is a subset of a full measure A such that, if X = AS and A ∈ A, then A is uniquely determined by X up to permutation and scaling of the columns. Sketch of the proof. The condition (H1) (c) implies that for any column ai of the mixing matrix there exists less that |Ii,j | detectable subspaces in the data which intersection is this column. One more subspace is reserved for ”confirmation” that their intersection is indeed a column of the mixing matrix. Their intersection is stable under small perturbation of the columns, so ”most” of the matrices will exclude false intersection (not corresponding to a column). ”Detectable subspace” here means that it contains at least |Ii,j | + 1 data vectors (columns of X) different from zero and any |Ii,j | from them are linearly independent (follows from (H1) (b)), so such a subspace can be detected by an algorithm. Condition (H2) ensures that there is no false intersection of subspaces (not corresponding to a column of A).
Identifiability Conditions and Subspace Clustering in Sparse BSS
359
4 Affine Hyperplane Skeletons of a Finite Set of Points The solution {(n0i , b0i )}ki=1 of the minimization problem: N minimize f {(nTi , bi )}ki=1 = min |nTi xj − bi |l j=1
1ik
subject to ni = 1, bi ∈ R, i = 1, ..., k,
(4) (5)
defines affine hyperplane k (l) -skeleton of X ∈ Rm×N , introduced for l = 1 in [8] and for l = 2, in [1]. It consists of a union of k affine hyper-planes T
Hi = {x ∈ Rm : n0i x = b0i }, i = 1, ..., k, such that the sum of minimum distances raised to power l, from every point xj to them is minimal. Consider the following minimization problem: minimize
N j=1
˜ j |l min |˜ nTi x
1ik
subject to ˜ ni = 1, i = 1, ..., k,
(6) (7)
˜ i = (ni , bi ), x ˜ j = (xj , −1). Its solution n ˜ i defines hyperpane skeleton in where n Rm+1 , consisting of a union of k hyperplanes ˜ i = {˜ ˜ = 0}, i = 1, ..., k, ˜ Ti x H x ∈ Rm+1 : n ˜ j to them such that the sum of minimum distances raised to power l, from every point x m ˜ is minimal. The restriction of Hi to R gives Hi . The usefulness of working in Rm+1 ˜ i the i-th cluster can be seen for l = 2, if we denote by X ˜ i = {˜ ˜ j |2 = |˜ ˜ j |2 } X xj : min |˜ nTp x nTi x 1pk
(8)
for some given {˜ nTi }ki=1 . Then the following minimization problem ˜ iX ˜ T vi minimize viT X i subject to v ∈ Rm+1 , vi = 1,
(9) (10)
will produce a solution {vi0 }ki=1 (consisting of eigenvectors corresponding to the minimal eigenvalues) for which f in (4) will not increase, i.e. f ({vi0 }ki=1 ) f ({˜ ni }ki=1 ). So we arrived to the following analogue of the classical k-means clustering algorithm. Hyperplane clustering algorithm. (see the pseudocode of the subspace clustering algorithm below in the particular case when ri = 1). Apply iteratively the following two steps until convergence:
360
P. Georgiev, F. Theis, and A. Ralescu
˜ i , i = 1, ..., k by (8), 1) Cluster assignment - forming X 2) Cluster update - solving the eigenvalue problem (9), (10). Such kind of analogue of the classical k-means clustering algorithm, (working in Rm instead in Rm+1 ) was introduced by Bradley and Mangasarian in [1] and called k-plane clustering algorithm. Since our hyperplane clustering algorithm is equivalent to the k-plane clustering algorithm, it has the same properties, i.e. termination after finite number of steps at a cluster assignment which is locally optimal, i.e. f in (4) cannot be decreased by either reassigning of a point to a different cluster plane, or by defining a new cluster plane for any of the clusters (see Theorem 3.7 in [1]). Affine Subspace Skeleton The solution of the following minimization problem minimize F ({Ui }ki=1 ) =
N j=1
min
1ik
ri
|uTi,s xj − bi,s |l
(11)
s=1
subject to ui,s = 1, bi,s ∈ R, i = 1, ..., k, s = 1, ..., ri , uTi,p ui,q = 0, p = q,
(12) (13)
i where where Ui = {ui,s }rs=1 , will be called affine subspace skeleton. It consists of a union of k affine subspaces Si = {x ∈ Rm : uTi,s x = bi,s , s = 1, ..., ri } each with codimension ri , i = 1, ..., k, such that the sum of minimum distances raised to power l, from every point xj to them is minimal. Consider the following minimization problem:
˜ i }k ) = minimize F˜ ({U i=1
N j=1
min
1ik
ri
˜ j |l |˜ uTi,s x
(14)
s=1
subject to ˜ ui,s = 1, ∈ R, i = 1, ..., k, ˜ i,q = 0, p = q, ˜ Ti,p u s = 1, ..., ri , u
(15) (16)
i ˜ i = {˜ ˜ i,s = (ui,s , bi,s ), x ˜ j = (xj , −1). Its solution n ˜ i,s defines a where U ui,s }rs=1 ,u subspace skeleton in Rm+1 , consisting of a union of k subspaces ˜ = 0, s = 1, ..., ri }, i = 1, ..., k, ˜ Ti,s x S˜i = {˜ x ∈ Rm+1 : n
˜ j to them such that the sum of minimum distances raised to power l, from every point x is minimal. The restriction of S˜i to Rm gives Si (a non-trivial fact). Similarly to the hyperspace clustering algorithm, we can write the following analogue of the classical k-means clustering algorithm for finding the subspace skeleton, when l = 2. Subspace clustering algorithm. (see the pseudocode below) It consists of two steps applied alternatively until convergence: ˜ i , i = 1, ..., k by: for given k orthonormal families 1) Cluster assignment - forming X ri of vectors {˜ ni,s }s=1 , i = 1, ..., k, put ri ri ˜i = x ˜ j |2 = min ˜ j |2 . ˜j : X (17) |˜ uTi,s x |˜ uTi,s x s=1
1ik
s=1
Identifiability Conditions and Subspace Clustering in Sparse BSS
361
2) Cluster update - for every i = 1, ..., k, solving the minimization problem minimize
ri
˜ iX ˜ Ti Vi ) trace(ViT X
(18)
s=1
under orthogonality constraints ViT Vi = Iri ,
(19)
where Vi has dimensionality (n + 1 × ri ). For any i, it will produce a solution Vi0 - a matrix with columns equal to the eigen˜ T (a non-trivial ˜ iX vectors corresponding to the ri smallest eigenvalues of the matrix X i 0 k k ˜ ˜ ˜ ˜ fact) and F in (14) will not increase, i.e. F ({Vi }i=1 ) F ({Ui }i=1 ). This algorithm terminates in a finite number of steps (similarly to the BradleyMangasarian algorithm) to a solution which is locally optimal, i.e. F˜ in (14) cannot be decreased by either reassigning of a point to a different cluster subspace, or by defining a new cluster subspace for any of the clusters.
Algorithm 1. Subspace clustering algorithm Data: samples x(1), . . . , x(T ) Result: estimated k subspaces S˜i ⊂ Rn+1 , i = 1, ..., k given by the orthonormal bases i of their orthogonal complements S˜i⊥ . {˜ ui,s }rs=1 1
2
3 4 5
˜ i,s = (ui,s , bi,s ) with |˜ Initialize ε > 0 and randomly u ui,s | = 1, i = 1, . . . , k, s = 1, . . . , ri . ˜ i }ki=1 is less do (Until the difference of F˜ calculated in two subsequent estimates of {U than ε) Cluster assignment. for t ← 1, . . . , T do ˜ i (a matrix), where i is chosen to minimize ˜ (t) = (x(t), −1) to cluster X Add x ri T 2 s=1 |ui,s xj − bi,s | (distance to the subspace Si ). end Cluster update. for i ← 1, . . . , k do ˜ iX ˜ Ti . Calculate the i-th cluster correlation C := X Choose eigenvectors vs , s = 1, ..., ri of C corresponding to the ri minimal eigenvalues. ˜ i = ∪ri u ˜ i,s ← vs /|vs |, s = 1, ..., ri , U Set u s=1 ˜ i,s . end end
Algorithm for Identification of the Mixing Matrix 1) Cluster the columns {X(:, j) : j ∈ N1 } in k groups Hi , i = 1, ..., k such that the span of the elements of each group Hi produces ri -codimensional subspace and these ri -codimensional subspaces are different. 2) Calculate any basis of the orthogonal complement of each of these ri-codimensional subspaces.
362
P. Georgiev, F. Theis, and A. Ralescu
3) Find all possible groups such that each of them is composed of the elements of at least m bases in 2), and the vectors in each group form a hyperplane. The number of these hyperplanes gives the number of sources n. The normal vectors to these hyperplanes are estimations of the columns of the mixing matrix A (up to permutation and scaling). In practical realization all operations in the above algorithm are performed up to some precision ε > 0. Source Recovery Algorithm 1. Repeat for j = 1 to N : 2.1. Identify the subspace Hi containing xj := X(:, j), or, in practical situation with presence of noise, identify Hi to which the distance from xj is minimal and project xj ˜j ; onto Hi to x 2.2. if Hi is produced by the linear hull of column vectors ap1 , ..., apm−ri , then find m−r i ˜j = coefficients Lj,l such that x Lj,l apl . These coefficients are uniquely deterl=1
˜ j (see [5], Theorem 3). mined generically (i.e. up to measure zero) for x 2.3. Construct the solution sj = S(:, j): it contains Lj,l in the place pl for l = 1, ..., m − ri , the other its components are zero. It is clear that such found (A, S) is a solution to the least square BSS problem defined by (2).
5 Computer Simulation Examples First example. We created artificially four source signals, sparse of level 2, i.e. each column of the source matrix contains at least 2 zeros (shown in Figure 1). They are mixed with a square nonsingular randomly generated matrix A: ⎛
0.4621 ⎜ 0.2002 A=⎝ 0.8138 0.2899
0.6285 0.4921 0.4558 0.3938
0.5606 0.5829 0.2195 0.5457
⎞
0.4399 0.3058 ⎟ . 0.2762 ⎠ 0.7979
The mixed sources are shown in Fig.2, the recovered sources by our algorithm are shown in Fig. 3. The created signals are superposition of sinusoidal signals, so it is clear that they are dependent and ICA methods could not separate them (lack of space prevent us to show the results of applying ICA algorithms). The estimated mixing matrix with our subspace clustering algorithm is ⎛
0.6285 ⎜ 0.4921 ⎜ M=⎝ 0.4558 0.3938
0.4399 0.3059 0.2762 0.7979
0.4621 0.2002 0.8138 0.2899
⎞ 0.5606 0.5829 ⎟ ⎟. 0.2195 ⎠ 0.5457
Identifiability Conditions and Subspace Clustering in Sparse BSS
363
Second Example. Now we create six signals with level of sparsity 4, i.e. each column of the source matrix has 4 zeros. We mixed them with a randomly generated matrix with dimension (3 × 6): ⎛
⎞ 0.8256 0.3828 0.4857 0.4053 0.7720 0.2959 H = ⎝ 0.2008 0.7021 0.0197 0.5610 0.6182 0.6822 ⎠ . 0.5273 0.6004 0.8739 0.7218 0.1476 0.6686
The estimated mixing matrix with our subspace hyperplane clustering algorithm is ⎛
⎞ 0.2959 0.4857 0.8256 0.4053 0.3828 0.7720 M = ⎝ 0.6822 0.0197 0.2008 0.5610 0.7021 0.6182 ⎠ . 0.6686 0.8739 0.5273 0.7218 0.6004 0.1476
Here the number of hyperplanes presented in the data are 9, while the number of all possible hyperplanes (according to the identifiability conditions in [5]) when all combination of 4 zeros in the columns of the source matrix are present, are 15.
0.2
0.5
0
0
−0.2
−0.5
0
100
200
300
400
500
600
700
800
0.2
0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
0.5
0.1
0
0
−0.5
0
100
200
300
400
500
600
700
800
0.2
0.2
0
0
−0.2
−0.2
0
100
200
300
400
500
600
700
800
1
0.5
0.5
0
0
−0.5
0
100
200
300
400
500
600
700
800
Fig. 1. Example 1: original source signals (left) and mixed ones (right)
0.2
1 0 −1 0 1
0 0
100
200
300
400
500
600
700
800
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
0
1 −1 0 1
0.5 0 −0.5
0 0
100
200
300
400
500
600
700
800
0.2
−1 0 1 0
0
−0.2
−1 0 1 0
100
200
300
400
500
600
700
800
0
0.2 −1 0 0.5 0 0 −0.2
0
100
200
300
400
500
600
700
800
−0.5
0
Fig. 2. Left: Example 1 - separated signals. Right: Example 2 - original source signals
364
P. Georgiev, F. Theis, and A. Ralescu
2
0.5 0
1
−0.5 0 1
0
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
100
200
300
400
500
600
700
800
900
−1 0 −2
0
100
200
300
400
500
600
700
800
900
2
0
1
−1 0 1
0 −1 −2
−1 0 1
0 0
100
200
300
400
500
600
700
800
900
−1 0 1
2 0 1 −1 0 1
0 −1 −2
0 0
100
200
300
400
500
600
700
800
900
−1
0
Fig. 3. Example 2: mixed signals (left) and separated signals (right)
6 Conclusion We presented new identifiability conditions for sparse BSS problem, allowing less hyperplanes in the data points for full recovery of the original sources and the mixing matrix. New subspace clustering algorithm is designed for subspace detection, based on eigenvalue decomposition. The square and the underdetermined cases for signals with bigger sparsity are illustrated by examples.
References 1. Bradley, P.S., Mangasarian, O.L.: k-Plane Clustering. J. Global optim. 16(1), 23–32 (2000) 2. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. John Wiley, Chichester (2002) 3. Comon, P.: Independent component analysis - a new concept? Signal Processing 36, 287–314 (1994) 4. Eriksson, J., Koivunen, V.: Identifiability and separability of linear ica models revisited. In: Proc. of ICA 2003, pp. 23–27 (2003) 5. Georgiev, P., Theis, F., Cichocki, A.: Sparse Component Analysis and Blind Source Separation of Underdetermined Mixtures. IEEE Trans. of Neural Networks 16(4), 992–996 (2005) 6. Hyv¨arinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001) 7. Lee, T.-W., Lewicki, M.S., Girolami, M., Sejnowski, T.J.: Blind sourse separation of more sources than mixtures using overcomplete representaitons. IEEE Signal Process. Lett. 6(4), 87–90 (1999) 8. Rubinov, A.M., Soukhoroukova, N., Ugon, J.: Minimization of the sum of minia of convex functions and its application to slustering. In: Jeyakumar, V., Rubinov, A. (eds.) Continuous Optimization, Current trends and modern applications, pp. 409–434. Springer, Heidelberg (2005) 9. Theis, F.J., Lang, E.W., Puntonet, C.G.: A geometric algorithm for overcomplete linear ICA. Neurocomputing 56, 381–398 (2004) 10. Waheed, K., Salem, F.: Algebraic Overcomplete Independent Component Analysis. In: Proc. Int. Conf. ICA2003, Nara, Japan, pp. 1077–1082 (2003) 11. Zibulevsky, M., Pearlmutter, B.A.: Blind source separation by sparse decomposition in a signal dictionary. Neural Comput. 13(4), 863–882 (2001)