International Journal of Neural Systems, Vol. 10, No. 6 (2000) 483–490 c World Scientific Publishing Company
Letter to the Editor RIVAL PENALIZED COMPETITIVE LEARNING BASED APPROACH FOR DISCRETE-VALUED SOURCE SEPARATION YIU-MING CHEUNG∗ and LEI XU† Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong, PRC ∗ E-mail:
[email protected] † E-mail:
[email protected] Received 6 July 2000 Revised 16 October 2000 Accepted 8 November 2000 This paper presents an approach based on Rival Penalized Competitive Learning (RPCL) rules for discrete-valued source separation. In this approach, we first build a connection between the source number and the cluster number of observations. Then, we use the RPCL rule to automatically find out the correct number of clusters such that the source number is determined. Moreover, we tune the de-mixing matrix based on the cluster centers instead of the observation themselves, whereby the noise interference is considerably reduced. The experiments have shown that this new approach not only quickly and automatically determines the number of sources, but also is insensitive to the noise in performing blind source separation.
1. Introduction
Σ. The objective of an ICA approach is to recover st ’s up to any constant scale and any permutation of indices from the observations xt ’s by finding out a de-mixing process F such that:
Due to the attractive applications on wireless communication,5 image processing4 and so on, blind source separation (BSS) problems have recently received wide attention in the literature of signal processing and neural networks. One BSS problem can be formulated as independent component analysis (ICA) problem: Suppose k channels of non-Gaussian source signals are sampled at discrete time t, denoted (1) (2) (k) as st = [st , st , . . . , st ]T with 0 ≤ t ≤ N . The sources are instantaneously and linearly mixed by an unknown d × k mixing matrix A and observed as (1) (2) (d) xt = [xt , xt , . . . , xt ]T : xt = Ast + et
yt = F (xt ) = PΛst ,
where P is a permutation matrix, Λ is a diagonal matrix, and yt is called the recovered signal of st . Particularly, when noise is negligible, the determination of a de-mixing process F in Eq. (3) is equivalent to find out a k × d matrix W such that yt = Wxt = WAst = PΛst .
E(et eTt ) = Σ ,
(4)
In the past, many ICA approaches based on different theories and methodologies have been proposed to learn W. The typical examples include Information-maximization (INFORMAX),2 Minimum Mutual Information (MMI)1 and Learned
(1)
with Eet = 0 ,
(3)
(2)
where et is zero-mean white noise with the covariance 483
484 Y.-M. Cheung & L. Xu
2.5
ing the de-mixing matrix based on the cluster centers (also called local means) instead of the observation themselves. Consequently, it is considerably robust in performing BSS under the noisy environment.
2
1.5
We have demonstrated the performance of RPCL-DSS algorithm on Bernoulli source signals. It is found that the RPCL-DSS approach not only quickly finds out the correct source number, but also is insensitive to the noise interference.
1
0.5
0
−0.5 −0.5
0
0.5
1
1.5
2
2.5
Fig. 1. The 2-dimension mixture of three binary sources.
2. A Relationship Between Source Number and Cluster Number There is a relationship between the source number and the number of clusters as follows: Theorem 1
Parametric Mixture (LPM) approach.12,13 These existing approaches need to pre-assign the source number k, which however, is often unknown in advance from the practical viewpoint. Furthermore, many experiments have shown that their performance on BSS will deteriorate when noise is obvious. To tackle such problems, the Bayesian Kullback Ying-Yang Dependence Reduction Theory (BKYYDR) has recently been proposed in Refs. 8–10. The experiments in Ref. 9 have demonstrated that the BKYY-DR works well on binary sources. It cannot only determines the number of sources via Eq. (21) of Ref. 9, but also recovers the source signals well under noisy situation. However, the implementation of the BKYY-DR needs to optimize a Kullback-divergence function, which requires considerable computing costs. Alternatively, Ref. 6 has proposed a new way to solve the BSS problem through an unsupervised clustering analysis upon the fact that the observations in Eq. (1) form clusters as shown in Fig. 1 when the sources are discrete-valued. In this paper, we further develop that idea and propose an approach called Rival Penalized Competitive Learning Based Discrete-valued Source Separator (RPCL-DSS), which bears two new features in contrast to the original idea in Ref. 6: 1. The RPCL-DSS approach can quickly and automatically determine the source number k of sources. 2. The RPCL-DSS filters out the noise by learn-
Suppose each component of st takes one of n possible discrete values at any time step t. Given xt = Ast +et , where et is noise that satisfies Eq. (2), and A(st1 − st2 ) 6= 0 with st1 6= st2 for ∀st1 and st2 , the correct number of sources is k if and only if there are nk clusters. Proof • Proof of necessary condition (1) (2) (k) Considering st = [st , st , . . . , st ]T with each (j) component st having n possible values, we thus have nk different source vectors denoted as ¯si with i = 1, . . . , nk . We denote the set of ¯si ’s as Q. That is, we have st ∈ Q for ∀st . Since A(st1 − st2 ) 6= 0 with st1 6= st2 for ∀st1 and st2 , which implies that A¯si 6= A¯sj for ∀i 6= j, we therefore also have nk different vectors: ci = A¯si ,
with ¯si ∈ Q, i = 1, 2, . . . , nk .
By Eq. (1), we then have xt = Ast + et = cl + et
(5)
with cl = Ast . In Eq. (5), the noise et makes each observation xt is distributed centering around the point cl , and the magnitude of noise level just influences the overlapping degree among clusters. Consequently, we have nk clusters.
Rival Penalized Competitive Learning Based Approach for Discrete-valued Source Separation 485
• Proof of sufficient condition We assume the exact number of sources to be K. According to the “Necessary Condition” results, we should then have nK clusters. Now we have nk ones, i.e., nK = nk , whereby we have K = k.
In Theorem 1, we do not make any constraint on the rank of mixing matrix A. That is, the results stated in the theorem can be held regardless of whether the rank of A is larger than k or not. Furthermore, Theorem 1 also gives us two hints: 1. The number of sources can be estimated via determining the number of clusters formed by observations xt ’s. 2. Since the local means ci ’s have revealed all available useful information to recover st ’s, it is therefore enough to learn the de-mixing mak trix W based on these {ci }ni=1 whereby the learning of W is immune from the noise.
where γi is the relative winning frequency of the seed point qi on the observations we have previously taken. Step 2 Update the seed point qj by qnew = qold j j + ∆qj with
αc (xt − qj ), ∆qj = −αr (xt − qj ), 0,
if uj = 1 if uj = −1 otherwise,
(7)
(8)
where 0 < αr αc < 1 are the learning rates for the winner and rival seed points. Step 3 Repeat Steps 1 and 2 until the extra seed points are driven far away from the observation set D, or simply stop iterations when the indicators are kept unchanged for all xt ∈ D.
3. Overview of RPCL Algorithm The RPCL is an unsupervised learning rule for clustering analysis. Without pre-assigning the number of clusters, the RPCL randomly places m (m ≥ k) seed points {qj }m j=1 in the observation space containing the observation set {xt }N t=0 . For each observation xt , the RPCL not only modifies the seed point of winner to adapt to xt , but also penalizes the rival one (i.e., the second winner) by a smaller learning rate. The numerous experiments in Refs. 14, 3 and 11 have shown that the RPCL can quickly complete an appropriate clustering and automatically find out the correct number of clusters by driving extra seed points far away from the observation set D = {xt }N t=0 . In literature, there are a variety of RPCLs. Here, we show a classical one14 only, whose implementation can be summarized as follows: Step 1 Randomly take an observation xt , and let the indicator 1, if j = c = arg min γi kxt − qi k i uj = −1, if j = r = arg min γi kxt − qi k (6) i6=c 0, otherwise,
4. Rival Penalized Competitive Learning Based Discrete Source Separation The RPCL-DSS approach consists of two phases: 1. Find out the observation clusters via the Rival Penalized Competitive Learning (RPCL) rule as demonstrated in Sec. 3, whereby the number of source is determined according to Theorem 1. 2. Calculate the local means of each cluster, through which the de-mixing matrix W is learned. The detailed implementing procedure is given in below: Phase 1. Determination of the source number Step 1 Randomly assign m seed points {qj }m j=1 , with m being chosen arbitrarily, but large enough such that m ≥ nk ; Step 2 Randomly take an observation xt from the data set D, learn the seed points by Eqs. (6)–(8).
486 Y.-M. Cheung & L. Xu
Step 3 Phase 2. Learning of the de-mixing matrix W
After RPCL procedure, we denote the set of observations around qj as Cj , and let the estimated source number be kg = logn m ˜,
Step 1 Calculate the local means of the resulted nkg clusters by N nkg 1 XX pi (xt )xt (10) ci = Ni t=0 i=1
(9)
where m ˜ ≤ m is the number of seed points remaining in the range of D.
5
2.5
2
0
1.5
−5 1
−10 0.5
−15 0
−0.5 −0.5
0
0.5
1
1.5
2
−20 −7
2.5
−6
−5
−4
−3
(a)
−2
−1
0
1
2
3
(b)
4 2.5
2
2
0
1.5
−2
1
−4
0.5 0
−6
−0.5 2
−8 2 1.5
2.5 2
1
1.5
0.5
1 0.5
0 −0.5
0 −0.5
(c)
3
0 2
−2
1 0
−4
−1 −6
−2 −3
(d)
Fig. 2. The results obtained by the RPCL procedure with the different pairs of observation dimension d and source number k: (a)–(b) (d, k) = (2, 3); (c)–(d) (d, k) = (3, 3); (e)–(f) (d, k) = (3, 2); where 2k + 2 seed points marked by “+” are randomly initialized as shown in sub-figures (a), (c) and (e). The final positions of the seed points are shown in sub-figures (b), (d) and (f) with two extra seed points driven far away from the data set after the RPCL procedure.
Rival Penalized Competitive Learning Based Approach for Discrete-valued Source Separation 487
1.5
4 2
1
0 −2
0.5
−4
0
−6
−0.5 2
−10 10
−8
1.5
5
2 1
1.5
−5 −10
−10
0 −0.5
0
−5
0.5
0
10 5
0
1
0.5
−15
−0.5
(e)
−15 −20
(f) Fig. 2. (Continued )
with Ni =
N X
pi (xt )
t=0
pi (xt ) =
(11)
1, if xt ∈ Ci 0, otherwise ,
where Ni is the number of observations in Ci , and pi (xt ) is the indicator of xt belongs to Ci or not. Step 2 Randomly take an observation xt , let cx = arg min kxt − ci k . ci
(12)
We output the recovered signal yt by yt = arg min k¯si − Wcx k, ¯ si
1 ≤ i ≤ nkg ,
(13)
and update W by Wnew = Wold + η(yt − Wcx )cTx ,
(14)
where η is a small positive learning rate. 5. Experimental Demonstrations In the experiments, we fixed the learning rates αc = 0.1, αr = 0.01 and η = 0.1 for simplicity, although a more smoothed convergence of parameters can be obtained by some specific methods for changing the rates.7 We used Bernoulli source signals with its
success probability being 0.5. The sample size of each source is set at 200. Furthermore, we let the noise et be Gaussian white noise with the covariance Σ = 0.01I, where I is the identity matrix. • Experiment 1 We justified Theorem 1 through determining the appropriate source number via the RPCL procedure. We here considered three cases by letting the pairs of the observation dimension d and the source number k be (2, 3), (3, 3), and (3, 2) respectively. Furthermore, we arbitrarily selected the number of seed points to be m = 2k + 2 in all cases we tested. Figure 2 shows the experimental results. It can be seen that the correct cluster number has been correctly determined by the RPCL procedure. That is, the true source number k in the above three cases can be successfully obtained via Eq. (9) regardless of d = k or not. • Experiment 2 We showed the performance of RPCL-DSS algorithm on separating three source signals with the mixing matrix 1.0, 0.4, 0.6 A = 0.5, 1.0, 0.3 . (15) 0.2, 0.7, 1.0 The de-mixing matrix W was initialized at the identity matrix. An epoch is completed when all
488 Y.-M. Cheung & L. Xu
data points used to learn W are scanned once. As shown in Fig. 3(a), the learning curve of W converges after 50 epochs with a total scanning 23 × 50 = 400 (because the number of local means is 8) data points. After convergence, a snapshot of V = W × A is:
which implies that an appropriate W has been found. Figure 3(b) shows the performance of RPCL-DSS algorithm measured by the average hamming distance, and Fig. 3(c) gives a slide window of the BSS results, where the original sources have been totally recovered without any error.
0.9690 0.0115 0.0043 V = 0.0352 1.0056 −0.0099 , 0.0051 −0.0148 0.9952
In contrast, Fig. 4 shows the performance of RPCL-DSS with W learned on the whole observation signals. It is found that W still tends to converge after 50 epochs, but it needs to scan
Learned Curve of W
Error Curve
1.5
0.06
0.05 1
0.04 0.5 0.03 0 0.02
−0.5 0.01
−1 0
10
20
30
40
50
0 0
60
(a)
10
20
30
40
50
60
(b)
(c) Fig. 3. The performance of RPCL-DSS algorithm on separating three Bernoulli source signals with the de-mixing matrix W learned on the local means. (a) The learning curve of W; (b) the average hamming distance error curve; (c) a slide window showing the original sources st (in Row 1), the mixed signals xt (in Row 2) and the recovered signals yt (in Row 3).
Rival Penalized Competitive Learning Based Approach for Discrete-valued Source Separation 489
Error Curve
Learned Curve of W
0.06
1.5
0.05 1
0.04 0.5 0.03 0 0.02
−0.5 0.01
0 0
50
100
150
−1 0
50
(a)
100
150
(b)
Fig. 4. The performance of RPCL-DSS algorithm on separating three Bernoulli source signals with de-mixing matrix W learned on the whole observation signals. (a) The average hamming distance error curve; (b) the learning curve of W.
200 × 50 = 10, 000 data points, which is 25 times of the former. Furthermore, the error curve exists some fluctuations as shown in Fig. 4(a). This implies that the performance robustness of the algorithm can be enhanced indeed when W is learned on the local means instead of the observation themselves.
Acknowledgments The work described in this paper was fully supported by a grant from the Research Grant Council of the Hong Kong SAR (Project No. CUHK4297/98E). References
6. Concluding Remarks We have presented an approach for discrete-valued source separation under noisy environment. This approach not only automatically finds out the correct number of sources, but also learns the de-mixing matrix W based on the local means instead of the observation data set. Consequently, our proposed method considerably reduces the computing costs in W learning, and has robust performance on performing BSS as demonstrated in the accompanying experiments. This paper just gives a specific implementation of the RPCL-DSS approach by using the classical RPCL procedure,14 which assigns an observation xt to a cluster based on the least-square-error criterion. Hence, in principle, this RPCL is only suitable for analysis of those ball-shaped clusters, i.e., the noise variance Σ = aI with a being a positive constant. For a more general Σ of the noise et , some enhanced variants of the RPCL, such as RPCL Type B procedure in Ref. 11, should be therefore used instead.
1. S. I. Amari, A. Cichocki and H. H. Yang 1996, “A new learning algorithm for blind separation of sources,” in Advances in Neural Information Processing 8, eds. D. S. Touretzky, M. C. Mozer and M. E. Hasselmo, (MIT Press, Cambridge, MA), pp. 757–763. 2. A. J. Bell and T. J. Sejnowski 1995, “An informationmaximization approach to blind separation and blind deconvolution,” Neural Computation 7, 1129–1159. 3. Y. M. Cheung 1997, Investigations on number selection for finite mixture models and clustering analysis, M. Phil. Thesis, Department of Computer Science, The Chinese University of Hong Kong, Hong Kong, PRC. 4. J. Karhunen, A. Hyv¨ arinen, R. Vig´ ario, J. Hurri and E. Oja 1997, “Application of neural blind separation to signal and image processing,” Proc. 1997 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP’97), Munich, Germany, pp. 131–134. 5. T. W. Lee, A. J. Bell and R. Orglmeister 1997, “Blind source separation of real world signals,” Proceedings of IEEE International Conference on Neural Networks (ICNN97), Houston, USA, June 9–12. 6. P. Pajunen 1997, “A competitive learning algorithm for separating binary sources,” Proceedings of
490 Y.-M. Cheung & L. Xu
7. 8.
9.
10.
11.
European Symposium on Artificial Neural Networks (ESANN’97), Bruges, Belgium, pp. 255–260. H. Robbins and S. Monro 1951, “A stochastic approximation method,” Ann. Math. Stat. 22, 400–407. L. Xu 1997, “Bayesian Ying-Yang system and theory as a unified statistical learning approach (II): From unsupervised learning to supervised learning and temporal modeling and (III): Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning,” Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective (TANC97), eds. K. W. Wong, I. King and D. Y. Yeung (Springer), pp. 25–60. L. Xu 1998, “Bayesian Kullback Ying-Yang dependence reduction theory,” Neurocomputing, a special issue on Independence and Artificial Neural Networks, 22(1–3), 81–112. L. Xu 1998, “Bayesian Ying-Yang dependence reduction theory and blind source separation on instantaneous mixture,” Proceedings of International ICSC Workshop on Independence & Artificial Neural Networks I&ANN’98, Spain, pp. 45–51. L. Xu 1998, “Rival penalized competitive learn-
ing, finite mixture, and multisets clustering,” Proc. 1998 IEEE Int. Joint Conf. Neural Networks 3, 2525–2530. 12. L. Xu, C. C. Cheung, H. H. Yang and S. I. Amari 1997, “Independent component analysis by the information-theoretic approach with mixture of density,” Proc. 1997 IEEE Int. Conf. Neural Networks (IEEE-INNS IJCNN97), June 9–12, Houston, TX, USA, Vol. III, pp. 1821–1826. 13. L. Xu, C. C. Cheung, J. Ruan and S. I. Amari 1997, “Nonlinear and separation capability: Further justification for the ICA algorithm with a learned mixture of parametric densities,” — invited special session on blind signal separation,” Proceedings of 1997 European Symposium on Artificial Neural Networks, Bruges, April 16–98, pp. 291–296. 14. L. Xu, A. Krzy˙zak and E. Oja 1993, “Rival penalized competitive learning for clustering analysis, RBF nets and curve detection,” IEEE Transaction on Neural Networks 4, 636–648. (Its preliminary version is in Proc. 1992 Int. Joint Conf. Neural Networks 2, 665–670, 1992.)