IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
1099
Blind Source Separation Based on Cumulants With Time and Frequency Non-Properties Tiemin Mei, Fuliang Yin, and Jun Wang, Fellow, IEEE
Abstract—This paper presents new results on blind separation of instantaneously mixed independent sources based on high-order statistics together with their time and frequency non-properties (i.e., the non-stationarity and non-whiteness of sources). Separation criteria of mixtures are established on a set of cumulants at different time instants using the non-stationarity of sources and/or time-delayed cumulants using the non-whiteness of sources. It is shown that cumulants at different time instants and time-delayed cumulants can be used as criteria for blind source separation (BSS). Furthermore, it is proved that the cumulant-based separation criteria are directly related to the separability conditions. Batch-data and online learning rules are developed based on the joint diagonalization of symmetric fourth-order cumulant matrices, and the learning rules are further simplified to correlation-based BSS algorithms. In addition, an initialization strategy is proposed for improving the convergence of the learning rules. Simulation results are given to demonstrate the validity and performance of the algorithms. Index Terms—Blind source separation (BSS), cumulant, non-Gaussianity, non-stationarity, non-whiteness, separability condition, separation criterion.
I. INTRODUCTION
B
LIND source separation (BSS) is to restore a set of unknown source signals from observations which are the mixtures of source signals, where neither the source signals nor the mixing channels are known a priori. BSS has become an active research topic since the seminal work by Hérault and Jutten due to its potential applications in many fields such as audio processing and biomedical signal processing [1]–[4]. In the past decade, the BSS problem has been extensively explored. In general, BSS algorithms for instantaneous mixtures in the literatures can be categorized into two main classes: the algorithms based on information theory (including the maximum Manuscript received August 26, 2008; revised March 05, 2009. Current version published June 26, 2009. This work was supported in part by the National Natural Science Foundation of China under Grants 60372082 and 60772161, in part by the Specialized Research Fund for the Doctoral Program of Higher Education of China (200801410015), and in part by the Hong Kong Research Grant Council under Grant CUHK4203/04E. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Sen M. Kuo. T. Mei is with the School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China, on leave from the School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110168, China (e-mail:
[email protected]). F. Yin is with the School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China (e-mail:
[email protected]). J. Wang is with the Department of Mechanical & Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (e-mail:
[email protected]). Digital Object Identifier 10.1109/TASL.2009.2019924
likelihood approach) and those based directly on cumulants (including correlation). For BSS to be possible, source signals are usually assumed to be mutually independent, and source signals usually have at least one of the following three non-properties except for some special cases: non-Gaussianity (statistical non-property), non-stationarity (time non-property), and non-whiteness (frequency non-property). The algorithms based on information theory include the maximum-likelihood algorithms, the mutual information minimization algorithms, and the infomax algorithms [2]–[4], [6]–[10]. The common feature of these algorithms is that they need the non-Gaussianity assumption of sources. The independence of the output of separation system is achieved through minimizing the mutual information or the maximization of the entropy of the output signals. These approaches seldom exploit time or frequency non-properties of source signals (i.e., the non-whiteness and non-stationarity). The algorithms based directly on cumulants include the second-order statistic (SOS)-based algorithms and high-order statistic (HOS)-based algorithms. For these algorithms, the minimization of cross-cumulants or the maximization of absolute autocumulants are usually used other than the independence to achieve source separation. The SOS-based algorithms exploit the time and/or frequency non-properties of sources. In other words, in SOS-based algorithms, source signals must be of non-stationarity and/or non-whiteness properties [11]–[18]. Typical SOS-based algorithms are AMUSE [11], SOBI [12], GED [13], and those in [14]–[18]. The advantages of SOS-based algorithms are the simplicity of computation and the capability of separating Gaussian source signals. HOS plays an important role in BSS. In fact, HOS is used for BSS earlier than any other methods. HOS-based algorithms utilize the non-Gaussianity of source signals. There should be no more than one Gaussian signal among the sources. The joint approximate diagonalization of eigen-matrices (JADE) by Cardoso [19], Zarzoso and Delfosse’s algorithms which is based on pre-whitening and rotation angle estimation [20], [21], and the algorithms proposed in [22]–[26], and [35] are all based on high-order cumulants. The common features of these high-order cumulant-based algorithms are that pre-whitening is usually a necessary step and the time and/or the frequency non-properties such as the time-delayed cumulants and the cumulants at different time instants are not sufficiently considered. Müller [27] and Georgiev [28] considered time-delayed cumulants for the separation of noisy mixtures. It gives us a clue that the time and frequency non-properties of sources can be exploited in HOS-based BSS algorithms as that in SOS-based algorithms. Although some researchers have made some efforts using the
1558-7916/$25.00 © 2009 IEEE Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
1100
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
HOS non-stationarity of sources [36], further investigation is deemed necessary and rewarding. In this paper, we present an approach to the separation criteria of instantaneous mixtures based on a set of cumulants at different time instants (using the time non-property of sources) and/or time-delayed cumulants (using the frequency non-property of sources). The non-properties were systematically studied by Cardoso; however, they were limited to second-order statistics [5]. In this paper, we generalize these non-properties to high-order statistics. It will be shown that any order cumulants can be used to construct a sufficient criteria for BSS. Furthermore, we find that the cumulant-based separation criteria are directly related to the separability conditions imposed on the sources. An iterative learning algorithm is developed based on the joint diagonalization of the symmetric fourth-order cumulant matrices. Simulation results are given to demonstrate the performance of the algorithm. The rest of the paper is organized as follows. In Section II, the mixing and separating models are introduced and the cumulant-based theory of BSS is formulated. The learning algorithm based on the joint diagonalization of symmetric fourth-order cumulant matrices is presented in Section III. Simulation results are given in Section IV. Finally, conclusions are made in Section V.
the
paper,
we
will
use
the
where , in addition, [34]. In practice, the number of sources is usually unknown, so we first estimate with singular value decomposition of the correlation matrix of the observations as described in [12], then the first observations are used for source separation. For theoretical analysis convenience, observation noise is not considered here. as the output By denoting as the separation matrix of the separation system and (3)
nota(4)
,
the
th-order cross cumulant of real-valued stochastic processes , and , where is a time instant, and time-delay . The th-order autocumulant of is defined by is denoted by . Superscript
(2)
is called the global matrix. where Equation (3) can be rewritten in its component form as
II. PRELIMINARIES Throughout tion
distributions are nearly Laplacian, mutual independence implies . We will show in this section that (1) holds for that the independence assumption for BSS can be relaxed with the aids of time and/or frequency non-properties of sources. In other words, the independence assumption can be replaced by some simpler assumptions. instantaneous mixing matrix is We assume that the time invariant and full column rank, then the mixing model can be expressed as follows:
denotes matrix transposition operation.
A. Mixing and Separating Models We assume that zero-mean, real-valued and statistically inare mixed by dependent source signals an unknown full column-rank and time-invariant mixing matrix . The task of BSS is to recover the source signals . from the observed mixtures According to cumulant properties [29], the independence of sources implies mathematically that, for any given time instant and time delays , if at least one is of the source indices ’s different from the others, then the following equation will hold: (1) This is a very strict constraint imposed on the sources for BSS, because it requires that all cross-cumulants must be zero. For instance, if the sources are of Gaussian distribution, we need only to check whether their second-order cross-cumulants are zero or not, but if the sources are speeches whose probability
is an entry of matrix . where If the separation matrix is determined such that the global matrix is the multiplication of a diagonal matrix and a permutation matrix , then the outputs of signal separation system are the source signals scaled by indeterminate constants (the diagonal elements of ) and in indeterminate order (determined by ). The indeterminacies of and are unavoidable in blind source separation if there is not any additional information for use. B. Separation Criteria Based on Cumulants In this subsection, we establish the BSS separability conditions to which the sources are subject for BSS to be possible and separation criterion on which BSS is based. Usually, the separability condition is simply stated as the independence property. However, we will point out that the independence condition can be relaxed if some other non-properties are taken into account in BSS. Using the properties of cumulants and the independence asth-order pairwise cross-cusumption of sources, the is given as follows: mulant of
(5) for
.
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
MEI et al.: BSS BASED ON CUMULANTS WITH TIME AND FREQUENCY NON-PROPERTIES
In (5), only the following part of the zero cross-cumulants condition in (1) is utilized:
(6) where the
subscripts ’s and ’s are not identical. For BSS, we want to know when and how the global matrix , so we consider the situation which will/can satisfy the th-order cross-cumulants of the output of separation system are forced to zero. th-order cumulants of sources are of the same If the signs with and being even numbers, then is a sufficient BSS condition. This is just a generalization of [35]. It is a complicate situation if the cumulants of sources are of different signs and and are odd numbers. we will discuss this in the following several paragraphs. and Let us consider different time instances different time delays in (5), a group of cross-cumulant zero-forcing conditions are given as follows: (7) According to (5), for given corresponds to the index matrix form as
, the
equations in (7) (which ) can be rewritten in
(8) where for is an convenience; matrix with being its entry in the th row th column, we name it as the characteristic cumulant matrix of sources, which plays an important role in BSS; we denote which is an -vector. , (8) is a system of linear hoFor a given pair of mogeneous equations with the characteristic cumulant maand being the coefficient matrix trix and variable vector, respectively. In fact, for all pairs of , Equations in (8) form systems of linear homogeneous equations with the characteristic cumulant matrix being their common coefficient matrix. According to Cramer’s rule, if the characteristic cumulant matrix satisfies the following condition: (9) then there must be (10) for given , where is the determinant operator. , (10) means that Because of the special form of vector there is at most one nonzero entry in each row or column of the global matrix , this implies that the signals are separated. Condition (9) is an extra constraint on the sources in addition to the basic constraint (6).
1101
We now summarize the above results as follows. At the beginning of Section II, we assumed that sources are statistically inde, pendent, but in this subsection, we find that for given if zero mean sources satisfy conditions (6) and (9), then (7) is a sufficient condition for BSS. Thus, the conditions (6) and (9) are th-order cuthe separability conditions of sources under mulants. Obviously, the separability conditions (6) and (9) are simpler than independence condition; in addition, they are directly related to the separation criterion (7). For separability condition (9), some comments are necin the characteristic cumulant matrix essary. If , then consists of the autocumulants of the sources at completely different time instants. So if (9) still holds, it implies on the one hand that the sources th-order non-stationary, and on the other must be hand, their th-order cumulants change with time differently. We call this non-stationarity property of sources , this as the time non-property. For the special case: time non-property, which is the second-order non-stationarity, was used in [15] and [17]. If the characteristic cumulant matrix is only the function of the time-delays , it th-order stationary, and implies that the sources are consists of the time-delayed autocumulants of sources only. If (9) still holds, it implies that, on the one th-order non-white; on the hand, the sources are th-order autocumulants change other hand, their with time-delays differently, i.e., the trans-profiles of the poly-spectrums of sources are of different shapes. We call this non-whiteness property of sources as frequency non-property. This frequency non-property has been used in [12] for a special , where it was not stated as clearly as (9). If case and are set to be arbitrary values and (9) still holds, it demands that the sources should have both time and frequency non-properties. Such signals are also called non-independently and identically distributed (i.i.d.) processes. Equation (9) describes the non-properties of non-i.i.d. sources, so we may call it as the non-property separability condition. The above derivation and analysis lead to the following theorem. Theorem 1: For zero-mean and non-i.i.d. stochastic signals, if the separability conditions (6) and (9) hold, then (7) is the sufficient criterion for BSS. Remark 1: We give some considerations on the relationship between the separability conditions of sources and the separation criterion for BSS. Firstl, for HOS-based source separation, sources are usually assumed to be independent, and the process of BSS is in fact the detection of the independence of the output of the separation system [3], [19], but precisely speaking, the separability conditions (6) and (9) are not as strict as the independence condition. With the non-properties of sources, the independence assumption is reduced to the zero cross-cumulant condition of specified order as stated in (6). For instance, if we , then the separability condition (6) is nothing but set the uncorrelatedness condition. Meanwhile, the non-property separability condition (9) implies that the sources should have power spectra densities of different shapes (the non-whiteness property) or the sources are non-stationary stochastic processes. This uncorrelatedness condition together with the non-proper-
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
1102
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
ties is much simpler than the independence condition. If we and to different values, we can present different sepset arability conditions, and these separability conditions lead to different separation criteria. Second, we notice the fact in (6) that the bigger the are, the stricter the separability conditions imposed on the sources are. So from the point of view of BSS, lower-order cumulants are more preferable to higher-order ones, because they impose less separability constraints on the source signals. Third, the separation criterion of sources is related closely with the separability conditions. Just as what we have pointed out above, different separation criteria require different separability conditions of the sources. Remark 2: As stated in [5], if we closely examine all available BSS algorithms, we can find that almost all of them utilize at least one of the three non-properties in addition to the independence assumption: non-Gaussianity, non-whiteness, and non-stationarity. The non-Gaussianity can only be detected with HOS, but the non-whiteness and non-stationarity can be detected with both SOS and HOS. Thus, as SOS-based BSS algorithm concerned, only the non-whiteness and non-stationarity can be exploited, these examples can be seen in [12], [15], and [16]. On the contrary, all these three non-properties can be exploited in HOS-based BSS algorithms, but in fact, only the nonGaussianity has been taken into account in HOS-based BSS approaches so far. For example, Comon’s approaches [3], JADE [19], Fast fixed-point ICA [23], and KMA [33] are essentially the maximization of the non-Gaussianity of the output of separation system. Theorem 1 treats the other two non-properties: non-whiteness and non-stationarity. The separation criterion (7) is equivalent to the joint diagonalization of cumulant matrices, i.e., the joint diagonalization of the following matrices: (11) where is the th row and th column entry of the th matrix . It is easy to ’s are symmetric maprove that trices according to (5), which is an useful property for BSS algorithm design. Different and leads to different algorithms. For instance, in (11), then the joint diagonalization of if we set matrices (11) is the generalization of the algorithm proposed in [12]. In the next section, we will study the case where . III. LEARNING RULES In this section, we study a special case, i.e., . When we select the joint diagonalization of , the symmetric fourth-order cumulant matrices, as the separation criterion, we have admitted the fact that conditions (6) and (9) automatically hold for the case where .
and represent the th time where instant and time delay, respectively; is the identity matrix; is a sufficiently big constant to guarantee the symmetric ’s being positive definite, i.e., matrices , where is the minimum eigenvalue of matrix . If we set but for all , this implies that we exploit the non-stationbut for all arity of sources; if we set , this means that we use the nonwhiteness of sources. In general, for different points in , both of the nonstationarty and nonwhiteness 2-D space are used for source separation. ’s and ’s are of the Matrices same off-diagonal entries, therefore, the joint diagonalization ’s is equivalent to that of ’s, so of we establish the following joint optimization problem according to Hadamard’s inequality [30] (13) where composed
is a diagonal matrix the diagonal entries of matrix . Hadamard’s inequality says that holds for , symmetric and positive definite matrix equality holds if and only if is diagonal. So the joint optimization of the objectives in (13) is equivalent to ’s. Hadamard’s the joint diagonalization of inequality has been frequently used in decorrelation-based BSS approaches [15], [17], but it has never been used in cumulant-based BSS methods. For simplicity, we need the following definitions about the operations between matrices and third-order tensors. matrices and , and third-order tensors and For with dimension , their entries are and , respectively. We define the operations and as follows: of
(14) (15) where as far as the entries of third-order tensors are concerned, the first index is row index, the second one is column index and the third one is layer index. Utilizing the definition in (14), the conventional gradient of is derived as follows:
A. Basic Learning Rules To utilize the non-stationarity and non-whiteness of sources, we assume that the sources are non-i.i.d. processes. Let (12) Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
(16)
MEI et al.: BSS BASED ON CUMULANTS WITH TIME AND FREQUENCY NON-PROPERTIES
where the third-order cumulant tensor
where is an unitary matrix, and is a diagonal matrix. The initialization value of the separation matrix is set to be that
and , where
and are their th row th
column and th layer entry, respectively. If we consider only the non-stationarity of sources and let , then (16) can be simplified as follows:
(17) matrix where the where column entry; the
1103
, th row and th cumulant tensor , where is its th row th column and th
is its third-order
layer entry. Based on (17), the steepest decent learning rule for nonstationary sources is as follows:
(18) is the learning rate parameter. where If the natural or relative gradient [31], [32] is used, we obtain the following learning rule with the aid of definition (15)
(19) , the third-order cumulant tensor . Learning rule (19) is a batch-data algorithm. In implementablocks of size . tion, the whole data set is divided into Learning rule (19) is applied to these batches in turn to achieve the joint diagonalization of these cumulant matrices. Learning rule (19) can also be implemented sample by sample. We get the following online learning rule: where
(20) , the third-order cumulant where matrix tensor . When (20) converges after , where is a time instant ahead, it is equivalent to the joint . optimization of Both the batch-data algorithm (19) and the online learning algorithm (20) are of the equivariant property [32]. This property implies that the convergence behavior of the global matrix is irrelevant to the mixing matrix. However, in practice, the convergence behavior is closely related with the initialization value of the separation matrix, so we propose the following adaptive initialization strategy. input samples , let For the first . The singular value decomposiis as follows: tion (SVD) of (21)
(22) In fact, this initialization matrix is the so-called pre-whitening matrix in BSS, but it plays a different role from pre-whitening usually used in BSS. This initialization strategy makes the approximately the product of the true separation matrix separation matrix and an orthogonal matrix [12]. It is well orthogonal matrix can be determined by known that an Givens rotation angles. Therefore, the initialization procedure can effectively reduce the indeterminacy of the separation matrix. So it makes for the improvement of the convergence property of BSS approaches. Simulation results in Section IV show that this initialization can improve the convergence behavior significantly. The algorithms in (19) and (20) seem very complicated; they need the estimations of cumulant matrix and third-order cumulant tensor. A close examination shows that there are some intrinsic relationships between the cumulant matrix and the thirdand order cumulant tensor. The entries of matrices are the subset of the entries of the third-order cumulant , i.e., and tensor . This means that the th column of matrix is built up with the diagonal entries of the th layer of third-order . Similarly, the th column of matrix cumulant tensor is the th column of the th layer of the third-order cu. mulant tensor Moreover, can be expressed as
(23) where moment tensor with entries
represents the third-order
is the correlation matrix of the output of the separation system at time instant , and is the correlation matrix of mixtures at time instant represents the th column vector of ; the output of operator depends on its operand, similar to that defined is a column vector of the diagonal in MATLAB, is a diagonal matrix entries of as its diagonal entries. with the elements of So we need only to compute the third-order moment tensor and the correlation matrix . This will be more efficient than the direct computation of (19) and (20). Now we give some considerations about the selection of parameter in (12). When the initial value of separation matrix is set as (22), the output of separation system is approximately
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
1104
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
decorrelated. So the cumulant matrix mately expressed as follows:
where variance of
can be approxi-
the output of the separation system rather than the fourth-order cumulants themselves. This makes the algorithms computationally very efficient. For the online algorithm (28), the correlation matrix is estimated adaptively by using
(24)
(29)
, is the square
is a moving smooth parameter. where Compared with the other cumulant-based approaches, the present method is completely based on optimization, which lends itself to the improvement of separation performance. Even if the sources can not completely verify the separability conditions, the mismatching has relatively less effect on the results of BSS approaches using optimization. In contrast, the mismatching of separability conditions of sources has relatively more negative effect on those approaches partly/completely employing eigenvalue decomposition technique, such as Comon’s [3], JADE [19], AMUSE [11], and GED [13]. Moreover, as the approach can also be implemented in a sample-by-sample way, it is suitable for real-time signal processing.
at time
. has zero mean, This newly defined stochastic process its correlation matrix at time is symmetric and positive definite. To simplify the analysis, we assume , then the th that is , eigenvalue of cumulant matrix where is the corresponding eigenvalue of matrix . So in order for in (12) to be . positive definite, parameter must satisfy B. Simplified Learning Rules The fourth-order symmetric cumulant matrix can be expressed as follows: (25) where
is the Hadamard product operator; , as defined in (24). On the one hand, we have proved that the joint diagonalizawill lead to BSS. On the other tion of must be diagonal and hand, if signals are separated, then the second term in (25) must be diagonal too, this leads directly to the diagonalization of the first term in (25). So for simplicity, only the second term in (25) is exploited to establish a more efficient algorithm as follows. and Let , where is sufficiently big to ensure being positive definite. Then the joint optimization of the following functions (26) leads exactly to the natural gradient-based algorithms (27) and (28):
(26)
IV. EXPERIMENTAL RESULTS Comparisons between different versions of the new algorithms and between the algorithms and the other typical HOS-based BSS algorithms are made based on computational experiments for the batch-data and online learning algorithms (27) and (28). These results show that the new algorithms are valid and of high separation performance. Three speech signals are used in these experiments. The length of the data is 24 000. The sampling frequency is 8 kHz. The error index (EI) defined in [9] is used to evaluate the performance of the algorithms:
(30) EI is the function of the parameters such as the learning rate , the moving smooth parameter , the parameter and the block size , so these parameters can be determined with intensive experiments so as to maximize EI. A. Separation Results
The batch-data learning rule is
The three speech signals are mixed up with the following arbitrarily chosen mixing matrix: (27) The online learning rule is
(28) . where Consequentially, the batch-data and online learning rules can be implemented by only computing the correlation matrices of
The three source signals and three mixtures are shown in Fig. 1(a) and (b), respectively. Fig. 1(c) shows the separation , the block result by batch-data algorithm (27), where and the learning rate . After separasize dB. This error tion, the error index of this result is index is almost independent of the mixing matrix because of the equivariant property of the algorithm. The online learning
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
MEI et al.: BSS BASED ON CUMULANTS WITH TIME AND FREQUENCY NON-PROPERTIES
1105
TABLE I COMPARISON BETWEEN THE ORIGINAL AND SIMPLIFIED ALGORITHMS
G
Fig. 2. Transients of the global matrix in (a) and EI in (b) of the batch-data algorithm (27). ;M ; : ; : dB.
= 10
= 1000 = 0 01 EI = 43 84
B. Performance Comparison Between Original and Simplified Algorithms
Fig. 1. (a) Sources, (b) mixtures, and (c) eparated results of the batch-data algorithm.
In Section III, the original algorithms (19) and (20) are simplified to algorithms (27) and (28) by keeping only the correlation matrix related term of the symmetric fourth-order cumulant matrix. Simulations show that the separation performance of the simplified version is greatly improved comparing with the original algorithms. This result also verify our point of view which lower-order cumulants are more preferable to higher-order ones in BSS. Comparison results are listed in Table I. For each algorithm, 100 randomly generated mixing matrices are used, and separation is performed with batch-data algorithms (19) and (27), respectively. C. Transient Characteristics of the Algorithms
algorithm (28) yields similar results. Experiments with several groups of other data also resulted in similar results.
The data used in Subsection IV-A was used to study the transient behaviors of algorithms (27) and (28).
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
1106
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
Fig. 4. Transients of the global matrix where : and : in f y : .
= 03 = 0 00001
Fig. 3. Transients of the global matrix ; learning algorithm (28).
= 10
= 31
G in (a) and EI in (b) of the online
= 0:99; = 0:0001; EI = 46:36 dB.
TABLE II COMPARISON WITH JADE AND COMON’S ALGORITHMS
The converging process of global matrix and error index of (27) are, respectively, shown in Fig. 2(a) and (b). The transient global matrix shows clearly that the separation matrix converged to a BSS solution. and For the online learning algorithm (28), we set , the learning rate is set to be . Fig. 3 illustrates the learning curves of the global matrix (a) and the dynamic error index curve (b). The averaged error index of the dB. last 2000 samples after convergence is From Figs. 2 and 3, we see that the learning process converges smoothly to a stable state when the sources are separated. apparently starts from a very In addition, the global matrix good condition because that we have adopted the initialization strategy (22), which makes the initial value of global matrix almost independent of the mixing matrix. This is just a reflection of the equivariant property.
Fig. 5. Transients of the global matrix . : learning rate
= 0 00005
G of Amari–Cichocki’s algorithm
()=
y
+ tanh( y), learning rate
G of Choi et al.’s algorithm where the
D. Comparison With Other Algorithms Six speech signals are used for comparison in this subsection. These speech signals are combined into five groups, each group contains three speech signals. Each group of source signals is mixed with randomly generated mixing matrix. We compare the batch-data and online algorithms (27) and (28) with Comon’s and JADE algorithms [3], [19]. Comon’s and JADE algorithms are based on fourth-order cumulants and exploit the non-Gaussian property of sources. Eigenvalue decomposition technique is used in these two approaches. So separation performances are negatively affected. Each algorithm was carried out on each group of source signals for 100 times with randomly generated mixing matrix (every entry of the mixing matrix is a random number of Gaussian distribution). The averages of EIs are listed in Table II. The results show that the proposed algorithms (27) and (28) have better separation performance than Comon’s and JADE algorithms, whereas the present algorithms are not as computationally efficient as Comon’s and JADE algorithms. Considering the transient behavior, we can compare the online algorithm (28) with Amari–Cichocki’s natural gradient algorithm [6] and Choi et al.’s decorrelation based algorithm [17].
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
MEI et al.: BSS BASED ON CUMULANTS WITH TIME AND FREQUENCY NON-PROPERTIES
TABLE III COMPARISON OF INITIALIZATION STRATEGIES
The convergence property of online algorithm (28) is better than Amari–Cichocki’s and Choi et al.’s algorithms (see Figs. 3(a), 4, and 5). The separated signals by (28) has almost no distortion which Amari–Cichocki’s suffers. While the distortion of separated sources by Choi et al.’s is not very serious in short time interval, the global matrix of Choi et al.’s algorithm is divergent. E. Investigation on the Initialization Strategies of Separation Matrix The novel initialization strategy (22) is compared with that , which is a rational and frequently used initialization strategy for BSS. For each initialization strategy, 200 randomly generated mixing matrices are used. Separation is performed with algorithm (28). The first 5000 samples of the mixtures are using strategy (22). used to initialize the separation matrix All the 200 sets of mixtures are successfully separated with our new initialization strategy. In contrast, only 64 out of 200 sets of mixtures are successfully separated with initialization strategy . The percentages of successful separation cases are listed in Table III. These experiment results show that the initialization strategy is superior. V. CONCLUSION Time and frequency non-properties are very important information that makes one signal different from the other. In this paper, we utilize the time and frequency non-properties of sources through cumulant matrices (i.e., the cumulant matrices at different time instants and/or cumulant matrices of different time delays) for blind source separation. We show that any order cumulants together with their time and/or frequency non-properties can be used to establish a separation criterion. Instead of using the independence assumption of sources, we propose a relaxed separability condition which is directly related with the separation criterion. The derivations show that the higher the order of the cumulants used in source separation, the stronger the separability conditions imposed on the sources. This suggests that lower-order cumulants, such as cross-correlation, are more preferable in source separation. Based on theoretical analysis, the cumulant-based algorithms, which are eventually simplified to correlation-based algorithms, are proposed. Simulation results show that the new algorithms perform better than the other cumulant-based algorithms. REFERENCES [1] J. Hérault, C. Jutten, and B. Ans, “Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé,” Proc. Xème Colloque GRETSI pp. 1017–1022, 1985. [2] J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Process. Lett., vol. 4, no. 4, pp. 112–114, Apr. 1997.
1107
[3] P. Comon, “Independent component analysis, a new concept?,” Signal Process., vol. 36, pp. 287–314, 1994. [4] J.-F. Cardoso, “Blind signal separation: Statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, Oct. 1998. [5] J.-F. Cardoso, “The three easy routes to independent component analysis; contrasts and geometry,” in Proc. ICA 2001 Workshop, 2001, pp. 1–6. [6] S. Amari and A. Cichocki, “Adaptive blind signal processing-neural network approaches,” Proc. IEEE, vol. 86, no. 10, pp. 2026–2048, 1998. [7] H. H. Yang, “Serial updating rule for blind separation derived from the method of scoring,” IEEE Trans. Signal Process., vol. 47, no. 8, pp. 2279–2285, Aug. 1999. [8] N. Vlassis and Y. Motomura, “Efficient source adaptivity in independent component analysis,” IEEE Trans. Neural Netw., vol. 12, no. 3, pp. 559–566, May 2001. [9] A. Cichocki et al., “Neural networks for blind separation with unknown number of sources,” Neurocomputing, vol. 24, pp. 55–93, 1999. [10] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind convolution,” Neural Comput., vol. 7, pp. 1129–1159, 1995. [11] L. Tong, R. Liu, V. C. Soon, and Y.-F. Huang, “Indeterminacy and identifiability of blind identification,” IEEE Trans. Circuits Syst., vol. 38, no. 5, pp. 499–509, 1991. [12] A. Belouchrani and K. Abed-Meraim, “A blind source separation technique using second-order statistics,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–443, Feb. 1997. [13] C. Chang and Z. Ding, “A matrix-pencil approach to blind separation of colored nonstationary signals,” IEEE Trans. Signal Process., vol. 48, no. 3, pp. 900–907, Mar. 2000. [14] L. Molgedey and H. G. Schuster, “Separation of a mixture of independent signals using time delayed correlations,” Phys. Rev. Lett., vol. 72, no. 23, pp. 3634–3637, 1994. [15] K. Matsuoka, M. Ohya, and M. Kawamoto, “A neural net for blind separation of nonstationary signals,” Neural Netw., vol. 8, no. 3, pp. 411–419, 1995. [16] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mixtures of nonstationary sources,” IEEE Trans. Signal Process., vol. 49, no. 9, pp. 1837–1848, 2001. [17] S. Choi, A. Cichocki, and S. Amari, “Equivariant nonstationary source separation,” Neural Netw., vol. 15, pp. 121–130, 2002. [18] F. Yin, T. Mei, and J. Wang, “Blind source separation based on decorrelation and nonstationarity,” IEEE Trans. Circuits Syst. I, vol. 54, no. 5, pp. 1150–1158, 2007. [19] J. F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Radar and Signal Process., IEE Proc. F, vol. 140, no. 6, pp. 362–370, Dec. 1993. [20] V. Zarzoso and A. K. Nandi, “Blind separation of independent sources for virtually any source probability density function,” IEEE Trans. Signal Process., vol. 47, no. 9, pp. 2419–2432, Sep. 1999. [21] N. Delfosse and P. Loubaton, “Adaptive separation of independent sources: A deflation approach,” Proc. IEEE ICASSP, vol. IV, pp. 41–44, 1994. [22] C. B. Papadias, “Globally convergent blind source separation based on a multiuser kurtosis maximization criterion,” IEEE Trans. Signal Process., vol. 48, no. 12, pp. 3508–3519, Dec. 2000. [23] A. Hyvarinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Comput., vol. 9, pp. 1483–1492, 1997. [24] E. Moreau, “A generalization of joint-diagonalization criteria for source separation,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 530–541, Mar. 2001. [25] J.-C. Pesquet and E. Moreau, “Cumulant-based independence measures for linear mixtures,” IEEE Trans. Inf. Theory, vol. 47, no. 5, pp. 1947–1956, May 2001. [26] L. De Lathauwer, B. De Moor, and J. Vandewalle, “Independent component analysis and (simultaneous) third-order tensor diagonalization,” IEEE Trans. Signal Process., vol. 49, no. 10, pp. 2262–2271, 2001. [27] K.-R. Müller, P. Philips, and A. Ziehe, “JADE TD: Combining higher order statistics and temporal information for blind source separation (with noise),” in Proc. ICA 1999 Workshop, 1999, pp. 87–92. [28] P. Georgiev and A. Cichocki, “Robust independent component analysis via time-delayed cumulant functions,” IEICE Trans. Fundamentals, vol. E86-A, no. 3, pp. 573–580, 2003. [29] C. L. Nikias and A. P. Petropulu, Higher-Order Spectra Analysis: A Nonlinear Signal Processing Framework. Englewood Cliffs, NJ: Prentice-Hall, 1993.
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.
1108
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009
[30] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1985. [31] S. Amari, “Natural gradient works efficiently in learning,” Neural Comput., vol. 10, pp. 251–276, 1998. [32] J. F. Cardoso and B. H. Laheld, “Equivariant adaptive source separation,” IEEE Trans. Signal Process., vol. 44, no. 12, pp. 3017–3030, Dec. 1996. [33] Z. Ding and T. Nguyen, “Stationary points of a kurtosis maximization algorithm for blind signal separation and antenna beamforming,” IEEE Trans. Signal Process., vol. 48, no. 6, pp. 1587–1596, Dec. 2000. [34] X. R. Cao and R. W. Liu, “General approach to blind source separation,” IEEE Trans. Signal Process., vol. 44, no. 3, pp. 562–570, 1996. [35] A. Mansour and N. Ohnishi, “Multichannel blind separation of sources algorithm based on cross-cumulant and the Levenberg–Marquardt method,” IEEE Trans. Signal Process., vol. 47, no. 11, pp. 3172–3179, Nov. 1999. [36] A. Hyvärinen, “Blind source separation by nonstationarity of variance: A cumulant-based approach,” IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1471–1474, Nov. 2001.
Tiemin Mei received the B.S. degree in physics from Zhongshan University, Guanzhou, China, in 1986, the M.S. degree in biophysics from China Medical University, Shenyang, in 1991, and the Ph.D. degree in signal and information processing from Dalian University of Technology, Dalian, China, in 2006. He has been with the Institute for Signal Processing, University of Lübeck, Lübeck, Germany, since 2007. He was a Visiting Fellow with the School of Electrical Computer and Telecommunications Engineering, the University of Wollongong, Wollongong, Australia, from 2004 to 2005. He has been a Member of Academic Staff at Shenyang Ligong University since 1996. His current research interests include stochastic signal processing and speech and image processing.
Fuliang Yin was born in Fushun city, Liaoning Province, China, in 1962. He received the B.S. degree in electronic engineering and the M.S. degree in communications and electronic systems from Dalian University of Technology (DUT), Dalian, China, in 1984 and 1987, respectively. He joined the Department of Electronic Engineering, DUT, as a Lecturer in 1987 and became an Associate Professor in 1991. He has been a Professor at DUT since 1994, and the Dean of the School of Electronic and Information Engineering of DUT since 2000. His research interests include digital signal processing, speech processing, image processing and pattern recognition, digital communication, and integrated circuit design.
Jun Wang (S’89–M’90–SM’93–F’07) received the B.S. degree in electrical engineering and the M.S. degree in systems engineering from Dalian University of Technology, Dalian, China, in 1982 and 1985, respectively, and the Ph.D. degree in systems engineering from Case Western Reserve University, Cleveland, OH, in 1991. He is a Professor in the Department of Mechanical and Automation Engineering, Chinese University of Hong Kong. Prior to this position, he held various academic positions at Dalian University of Technology, Case Western Reserve University, and the University of North Dakota. Besides, he also held various short-term visiting positions at USAF Armstrong Laboratory (1995), REKEN Brain Science Institute (2001), Université Catholique de Louvain (2001), Chinese Academy of Sciences (2002), and Huazhong University of Science and Technology (2006–2007). He also has held a Cheung Kong Chair Professorship in computer science and engineering at Shanghai Jiao Tong University since 2008. His current research interests include neural networks and their applications. Prof. Wang has been an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS since 1999, the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B since 2003, and a member of the Editorial Advisory Board of the International Journal of Neural Systems since 2006. He also served as an Associate Editor of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C (2002–2005), and a Guest Editor of the special issues of European Journal of Operational Research (1996), International Journal of Neural Systems (2007), and Neurocomputing (2008), He was an organizer of several international conferences such as the General Chair of the 13th International Conference on Neural Information Processing (2006) and the 2008 IEEE World Congress on Computational Intelligence held in Hong Kong. He served as the President of Asia Pacific Neural Network Assembly in 2006.
Authorized licensed use limited to: Chinese University of Hong Kong. Downloaded on July 1, 2009 at 03:07 from IEEE Xplore. Restrictions apply.