1150
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 5, MAY 2007
Blind-Source Separation Based on Decorrelation and Nonstationarity Fuliang Yin, Tiemin Mei, and Jun Wang, Fellow, IEEE
Abstract—In this paper, discrete-time blind-source separation (BSS) of instantaneous mixtures is studied. Decorrelation-based sufficient criteria for BSS of stationary and nonstionary sources are derived based on nonstationarity and nonwhiteness. A gradient algorithm is proposed based on these criteria. A batch-data algorithm and an on-line algorithm are developed based on the corollaries of the BSS criteria. These algorithms are especially useful for the separation of nonstationary sources. They are robust to additive white noises if the time-delayed decorrelation and the nonstationarity of the sources are considered simultaneously in the algorithms. Experiment results show the effectiveness and performance of the proposed algorithms. Index Terms—Blind-source separation (BSS), decorrelation, natural gradient, nonstationary processes, second-order statistics (SOS), stationary processes.
I. INTRODUCTION LIND-SOURCE separation (BSS) is to recover a set of signal sources from observations that are the mixtures of the signal sources via unknown transmission channels [1]. BSS is a challenging problem as neither the signal sources nor the mixing channels are known a priori. The only assumption that can be employed is that signal sources are statistically independent of each other. BSS has been an active research area in recent years due to its potential applications in many areas, such as audio processing, telecommunication, and biomedical signal processing; e.g., [2], [3]. In the last decade, many BSS techniques have been developed; e.g., algorithms based on information theory and those based on higher order statistics (HOS) or second-order statistics (SOS). The algorithms based on information theory and the HOS utilize the high-order statistics of source signals explicitly or implicitly [1], [4]–[14]. In theory, the effectiveness of the algorithms based on information/maximum likelihood theory such as [8], [11] depends on the correct estimation of the sources’ probability distribution— super- or sub-Gaussian distribution. This is because that the ac-
B
Manuscript received July 4 2004; revised November 27, 2004. This work was supported by the National Natural Science Foundation of China under Grant 60372082 and Grant 60172073, by the Trans-Century Training Program Foundation for the Talents by the Ministry of Education of China, and by the Research Grants Council of Hong Kong under CUHK4165/04E. This paper was recommended by Associate Editor C.-T. Lin. F. Yin is with the School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China (e-mail:
[email protected]). T. Mei is with the School of Information Science and Engineering, Shenyang Institute of Technology, Shenyang 110168, China (e-mail: meitiemin@163. com). J. Wang is with the Department of Mechanical & Automation Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong (e-mail: jwang @mae.cuhk.edu.hk). Digital Object Identifier 10.1109/TCSI.2007.895510
tivation functions of such separating networks are derived from the probability density functions of sources. Although the separation performance is not very sensitive to activation functions, it is necessary to treat the sources differently according to their super- or sub-Gaussian distribution properties [8]. Several adaptive estimation algorithms have been proposed for this purpose, but these measures make the algorithms complex [8], [13]. The HOS-based algorithms do not have such a problem, they treat the super- and sub-Gaussian sources identically. However, they cannot be used if there are more than one Gaussian sources in the mixtures [15]. However, the SOS-based algorithms, making use of the nonstationarity or the non-whiteness or both of sources can overcome those aforementioned disadvantages. Tong and Liu studied SOS-based algorithms as early as in 1990. They presented the separability conditions and the algorithm for multiple unknown source extraction (AMUSE) based on SOS for both correlated and uncorrelated stationary sources [16], [17]. It is shown in [16], [17] that two correlation matrices with different time-delays are sufficient for the separasources if some stronger conditions are imposed tion of on sources, but for the most general cases, up to correlation matrices with different time-delays are needed for the separation of mixed stationary colored sources. However, AMUSE is not effective when the sources involved are white. Molgedey and Schuster proposed a decorrelation algorithm from the viewpoint of neural networks [18]. Belouchrani et al. proposed the second-order blind identification (SOBI) algorithm based on the joint diagonalization of a set of covariance matrices of stationary processes, but the number of covariance matrices involved is uncertain [19]. These algorithms exploit only the time-delayed correlation information, but the nonstationarity of sources is not considered. It is known that the nonstationary property of sources is very useful information for source separation. Matsuoka et al. proposed an algorithm suitable for nonstationary processes based on neural networks. It is essentially a decorrelation-based algorithm [20]. Inspired by this work [20], Choi et al. presented an improved algorithm suitable for the separation of nonstationary processes based on feedforward and feedback neural networks [21]. These algorithms exploit only the nonstationarity of sources, so they will degenerate if there are stationary noises in the observations. They also proposed several anti-noise algorithms based on time-delayed correlations of sources [22], [23]. Chang et al. proposed the generalized eigendecomposition (GED) algorithm based on correlation matrix-pencil. For this algorithm, prewhitening is not necessary and only the time-delayed correlation matrices are used. As a result, this algorithm is robust to noise [24].
1549-8328/$25.00 © 2007 IEEE
YIN et al.: BSS BASED ON DECORRELATION AND NONSTATIONARITY
1151
In addition to instantaneous mixture separation, some recent results on convolutive mixture separation can be found in, e.g., [25]–[27] and the references therein. In this paper, sufficient separation criteria based on decorrelation for stationary and nonstationary sources are proposed in a general sense. The nonstationarity and the non-whiteness of sources are considered simultaneously to establish separation criteria and algorithms. Batch-data and on-line algorithms are proposed under these criteria. The main advantages of these algorithms are the simplicity in implementation, the independence of the probability distributions of sources, and the robustness against noises. Numerical experiment results are also given to demonstrate the effectiveness and characteristics of the new algorithms. The remaining part of this paper is organized as follows. In Section II, the mixing and separation models are first defined, and then the sufficient separation criteria are presented based on SOS by utilizing the nonstationarity and nonwhiteness of sources for both stationary and nonstationary random signals. The corresponding algorithms are developed in Section III. Some simulation results are given to demonstrate the validity of the algorithms in Section IV. Conclusions are summarized in Section V.
which makes If we can determine the separation matrix as the multiplication of a diagonal matrix the global matrix and a permutation matrix , then the outputs of the separation system are the source signals scaled by indeterminate constants (the diagonal elements of ) and in an indeterminate order (determined by ). The indeterminacies of and are inevitable in BSS. B. Separability Conditions and Decorrelation Criteria The separability conditions and the decorrelation criteria have been discussed extensively; e.g., [16], [17], [19]–[21], [24]. However, there is not an unified frame coping with both stationary and nonstationary sources. In this subsection, based on the nonstationarity and nonwhiteness, we give a set of separability conditions and the decorrelation criteria for both stationary and nonstationary sources. Theorem 1: For zero mean and real-valued nonstationary , if there exist at least random signals different points in 2-D space , such that (5) (6) then the sources are separable based on the decorrelation of the output of the separation system (2) at the different points ; i.e.,
II. PROBLEM FORMULATION A. Mixing and Separation Models zero mean and real-valued source signals are mixed by an unknown nonsingular . The task of and time-invariant mixing matrix observations BSS is to recover the sources from the . In this paper, we consider only the case when . The mixing process can be expressed mathematically as
(7)
Assume that
(1) where
, , is the noise vector, the superscript denotes matrix transpose op. erator, the variable is of discrete value in Generally, the feedforward and the feedback linear networks are used for the separation of instantaneous mixtures [21]. In this paper, the algorithm is based on the feedforward neural networks. be the separation matrix. The separation process can Let be expressed mathematically as (2) where the separation system. Substituting (1) into (2), we obtain
is the output vector of
where matrices , and ; ; , the th column vector of is nothing but the ; vector whose elements are the diagonal entries of operator rearranges the diagonal elements of matrix into a diagonal matrix; is the determinant of matrix ; is the expectation operator. Equations (5) and (6) are the source separability conditions, whereas (7) is the source separation criterion. Obviously, the separability conditions (5) and (6) are not equal to the uncorrelation condition, they put much less constraints on sources. Proof: From (4), the correlation matrices of the outputs of the separation system at the points in 2-D space are as follows: (8) By taking into account the condition (5), it is obvious that is symmetric. Under the decorrelation condition ; (7), and from (8), we obtain that for (9)
(3) where the noise is not considered here, but it will be reconsidered in the development of separation algorithms. , then Defining the global matrix as (4)
Let form
, (9) is rewritten in matrix
(10)
1152
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 5, MAY 2007
According to linear algebra and taking into account the second separability condition (6), (10) implies (11) Equation (11) implies that any row or column of the global can only has at most one nonzero entry. That is, matrix is the multiplication of a diagonal matrix and a permutation matrix . So we can conclude that the decorrelation of the outputs of the separation system at the different points means the separation of the mixed nonstain 2-D space tionary/nonwhite sources under the separability conditions (5) and (6). From Theorem 1, we have the following Corollary. Corollary 1: For uncorrelated nonstationary sources and given time delay , if ( ; ) are not constant with respect to time, then the sources are separable by the decorrelation criterion. has been studied by Matsuoka The special case which in matrix , and taking into account in [20]. Let that the condition ( ; ) are not constant with respect to time, there must be that . According to Theorem 1, nonstationary sources are separable by the decorrelation criterion. Theorem 2: For zero mean and real-valued stationary but , if there exist nonwhite random signals at least different time delays such that (12) (13) then the sources are separable by decorrelation of the output of separation system; that is (14) where matrix
, and
; ; . Conditions (12) and (13) are the separability conditions, whereas condition (14) is the separation criterion. The proof is similar to that of Theorem 1. Remark 1: It is well known that in the best case (i.e., , ), two correlation matrices ( , 2) are sufficient for the separation of sources. This conclusion can be seen in [17] and [24]. However, in the worst case (e.g., for time delay and , ), only source is separated from the others with correlation matrices and . So we need further another . But if, unfortunately, correlation matrix , then only sources separated from the others with correlation matrices
is
and . If such an unfortunate case occurs one after and : another until , then all of the correlation matrices are needed for the complete separation of these sources. It is easy to check that is still a full rank matrix in this worst case. Remark 2: Theorems 1 and 2 point out the least constraints on sources for BSS by decorrelation. It does not mean that these constraints are still necessary if other separation criterion is used. Remark 3: The nonstationarity of sources is as important as the nonwhiteness of sources, because nonstationarity itself is enough for source separation according to Corollary 1. From Theorem 2, we have the following corollaries. stoCorollary 2: If the signal sources are stationary chastic processes and (source number), then they are inseparable by means of decorrelation. Because the nonzero autocorrelation values of stoand the autocorchastic process are in the interval relation is symmetric, if , then . Corollary 3: If two or more than two sources are stationary second-order white noises whose autocorrelations are zero for any non-zero time delay , then they are inseparable by using the decorrelation criterion. This is because that when two or more than two sources are stationary second-order white noises. Corollary 4: If there are stationary second-order white noises in sources, and the colored sources meet the separability conditions in Theorem 2, and , then the colored sources can be separated by the decorrelation criterion, but they can not be separated from the white noises. In theory, the second-order stationary white noises do not affect the separation of the colored sources. This is because that the estimation of at can not be affected by the white noises. This is the basis on which we design noise-robust separation algorithms. III. ALGORITHM DEVELOPMENT In this section, we design the corresponding algorithms for the separation of both stationary and nonstationary stochastic sources. By utilizing the nonstationary properties of the signals, we present two simplified but effective algorithms for nonstationary processes. A. Gradient-Based Algorithm for Both Stationary and Nonstationary Sources The theorems in Section II show that the decorrelation-based source separation problem is equivalent to the joint diagonalization of the crosscorrelation matrices of the outputs of the separation system. The joint diagonalization of is equivalent to a joint optimization problem in terms of Hadamard’s inequality. For the sake of completeness, the Hadamard’s inequality is stated as follows.
YIN et al.: BSS BASED ON DECORRELATION AND NONSTATIONARITY
Hadamard’s Inequality [30]: If matrix positive semidefinite, then
1153
is symmetric and
From (17), we have (21)
(15) Furthermore, when is positive definite, then the equality holds If and only if is diagonal. For positive definite matrix , we define a function [20], [21] as follows:
By using (22) we obtain
(16) reaches its According to the Hadamard’s inequality (15), minimum, if and only if is a diagonal matrix. are The correlation matrices symmetric but generally not positive definite. We can not . But solve the optimization problem directly from according to matrix theory, though is not positive as definite, we can define
(23) Expressing (23) in matrix form, and using the following relation: (24) we have
(17) where
is a positive constant, and is the identity matrix. is symmetric and positive definite provided that is big enough; that is, where is the eigen. In practice, it is unnecessary to estimate the value of , just simply set to be sufficiently eigenvalue of large. such that If there is a separation matrix are jointly diagonalized, then must be jointly diagonalized too. , by virtue of (16), the objecLet tive functions of the optimization problem can be constructed as follows:
(25) The second part of (19) is
(26) Expressing (26) in components, we have
(27) (18) The joint diagonalization of means such that the function that there is a separation matrix reaches its minimum simultaneously for . The gradient of is calculated as follows:
where (28) Rewriting (27) in matrix form, we have (29)
(19)
In terms of the following differential formula
where
(30) and (24), we obtain (20)
and
is the diagonal element of matrix
.
(31)
1154
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 5, MAY 2007
Substituting (31) into (26), and taking into account the fact is symmetric, we have that
(32) Furthermore, by substituting (25) and (32) into (19), the gradient of the objective function is
(33) We obtain the following gradient-based learning rule: for all :
(34) If the natural or relative gradient is exploited [28], [29], we have the following learning rule:
(35) for all
; where the natural gradient is . Equations (34) and (35) are the learning rules of BSS derived based on the decorrelation criterion. They can be implemented in turn. Obviously, (35) by using has the equivariant property [29].
additive white noise. This is because that the affection of the can be alleadditive white noise on the estimation of . viated by setting 2) On-Line Learning Algorithm: According to Corollary 1, if the sources are nonstationary and uncorrelated to each other, for can be adaptively given , estimated and we obtain the following on-line learning rule:
(39) , time delay is generally set where to be 0, 1, 2. The on-line learning rule (39) is valid, because when conver, it means that gence of it is reached after for all , so we can always find at least which make Theorem 1 hold, that is, the different correlation matrices are jointly diagonalized. , Obviously, similar to the batch-data algorithm, if the learning rule (39) is robust against additive white noise because the affection of additive white noise on the estimation of will be lessen. If we set , then the learning rule (39) can be further simplified. , is positive definite except for some When special cases [31], so the definition (17) is unnecessary. by and replace We denote with in (39), then we obtain the following simplified learning rule for the nonstationary processes:
B. Simplified Algorithms Utilizing Nonstationary Properties
(40)
1) Batch-Data Learning Algorithm: If the sources are nonstationary and uncorrelated to each other, we obtain the following batch-data algorithm in terms of (35). with length (i.e., ) is The input data divided into batches . The length of each batch is , . The th batch is denoted by , where ; is the time is an by matrix. is estimated as delay. follows:
(36) where
is estimated with the input data . Thus
as
(37)
and is the identity matrix. where Learning rule (40) is a special case of (39). It can also be seen in [21] and [32] where it was derived in different ways. C. Implementation Considerations Both the batch-data and the on-line learning algorithms are of equivariant property. That is to say, the convergence behavior of the global matrix does not depend at all on the mixing matrix [29]. But as far as the above given algorithms are concerned, the convergence behavior of relies closely on the initial value of the separation matrix . We propose the following adaptive initialization scheme for both the batch-data and the on-line algorithms. For the first input data , let and . is as follows: The singular value decomposition (SVD) of (41)
The batch-data algorithm is as follows:
(38) where denotes the th iteration, returns the re, 1, or 2. mainder of for nonzero . Generally, During the iteration, the algorithm (38) exploits the different . As stated in data batches in turn through the operator , the batch-data algorithm (38) is robust to Section II, if
where and are unitary matrices, is a diagonal matrix. Let , the initial value of the separation matrix is set to (42) where the superscript operator.
denotes the Moore–Penrose inverse
YIN et al.: BSS BASED ON DECORRELATION AND NONSTATIONARITY
1155
The initialization matrix (42) is nothing but the prewhitening [19]. This measure can matrix of the first input data if improve the convergence behavior dramatically. For the on-line algorithm (39), the correlation matrix is estimated adaptively by (43) where
is the moving smooth parameter. IV. EXPERIMENTAL RESULTS
In this section, some examples are presented to demonstrate the validity and performance of the algorithms herein. To evaluate the performance of our algorithms, we calculate the error index (EI) with the following formula [11]:
Fig. 1. Transient of global matrix given by the batch-data algorithm: noise-free case, where = 0, = 0, = 0:002, M = 3150, and EI = 43:59 dB.
(44)
The noisy mixture model is as follows: (45) where is the white noise vector. The is cornoises are uncorrelated with the sources, whether related to each other or not is unimportant. The measure of the is calculated signal-to-noise ratio (SNR) of the observation with the following formula:
(46)
where is the entry of the mixing matrix . Three speech signals are used in the experiments. The sampling frequency is 16 kHz and the length of the mixture data is 30 000. The mixing matrix is randomly computer-produced and is computer-generated Gaussian white noise. A. Transient of Global Matrix The transients of global matrix are calculated with both the batch-data (38) and on-line (39) algorithms. Noise-free and noisy observations are used in these experiments. These results show that both the batch-data and the on-line algorithms are robust to additive white noises if we set the time-delay . 1) Batch-Data Algorithm: For the noise-free case, we set that , the data batch size , time delay , the learning rate . After convergence (5000 iterations), the EI of the global matrix dB. The transient of global matrix is shown in Fig. 1, where each curve corresponds to one element of the global matrix .
Fig. 2. Transient of global matrix given by the batch-data algorithm: noisy case (observation’s SNR is 17.19 dB), where = 0:24, = 1, = 0:002, M = 3150, and EI = 32:13 dB.
For the noisy case, the SNR of the observation dB. The parameters are the same as what are used in the noise-free case except that and . After convergence (5000 iterations), the EI of the global matrix dB. The transient of global matrix is shown in Fig. 2. 2) On-Line Algorithm: For the on-line algorithm, both noisefree and noisy observations are used in the experiments. The transient of global matrix is shown in Figs. 3 and 4, respectively. Proper parameters for the on-line algorithm make it possible to achieve a good performance in the separation of both noise-free and noisy observations. B. Effect of Parameter
on the Performance
In this subsection, we investigate how the parameter affects the performance of the algorithms. Noise-free observations are used in these experiments. 1) Batch-Data Algorithm: If the observations are of high , 1, the perSNR, when the time-delay is set to be that formance of the batch-data algorithm is almost not affected by
1156
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 5, MAY 2007
Fig. 3. Transient of global matrix given by the on-line algorithm: noise-free case, where = 0:008, = 0, = 0:0001, = 0:999, and EI = 40:58 dB.
Fig. 4. Transient of global matrix given by the on-line algorithm: noisy case (observation’s SNR is 17.18 dB), where = 0:005, = 1, = 0:0002, = 0:999, and EI = 37:18 dB.
in a large range. But if , 3, then the performance declines slowly with the increasing of parameter . The relationship between the EI and the parameter is shown in Fig. 5. 2) On-Line Algorithm: For the on-line algorithm, EI is calover the last 2000 culated from the averaged global matrix samples after the convergence of the algorithm. The relationship between the EI and the parameter shows us a very attractive property: there is an optimum parameter in terms of the EI. For the noise-free case as shown in Fig. 6, when the time-delay , the optimal gives an increis set to be that . ment of EI more than 10 dB comparing to that when C. Effect of Observation Noise on the Performance In theory, it is obvious that the estimation of time-delayed statistics will not be affected by additive white noise, this property has been used to obtain noise-robust separation algorithms. For both the batch-data and on-line algorithms, they will have this . But in practice, this is not always the property if we set case, because all of the calculations are done towards the finite data set. Although noise effects can not be removed completely
Fig. 5. Relationship between the EI and the parameter : the batch-data algorithm (noise-free case), where = [0; 1; 2; 3] ; = [0:002; 0:002; 0:002; 0:002] ; batch size M = 3150; Iteration = 20 000.
Fig. 6. Relationship between the EI and the parameter : the on-line algorithm where = [0; 1; 2; 3], = [0:0001; 0:0002; 0:0002; 0:0003], and = 0:999.
in the separation of noise-contaminated observations, it can be . Fig. 7 gives the relationship between alleviated by setting the SNR of observation and the separation performance. For comparison, we present the relationships between the EI and the SNR of observations calculated with SOBI with robust orthogonization (SOBI-RO) [33], second-order nonstationary source-separation algorithm (SONS) [22], AMUSE [17], and the joint approximate diagonalization of eigen matrices with time delay (JADE-TD) [34] algorithms in Fig. 7. These algorithms are all of good performance for noisy mixture separation except AMUSE. Fig. 7 shows that the performance of batch-data algorithm is very close to that of SOBI-RO, on the other hand, the performance of on-line algorithm is close to that of SONS and better than that of JADE-TD and AMUSE. Simulations show that the separation performance is almost not affected by the mixing matrix if there is not any observation noise, even when the condition (e.g., number of the mixing matrix is as high as ), both batch-data
YIN et al.: BSS BASED ON DECORRELATION AND NONSTATIONARITY
1157
REFERENCES
Fig. 7. Relationship between separation performance EI and the SNR of observations: comparing with other algorithms.
Fig. 8. Separation performance depends closely on the condition number of mixing matrix when there is noise in observations (mixture’s SNR = 15 dB).
and on-line algorithms give very good results. But if there are observation noises, the separation performance is not only affected by the observation noise, but also it depends closely on the mixing matrix. This is the limitation. Fig. 8 gives the relationship between EI and the condition number of mixing matrix when the SNR of observation is 15 dB. The mixing matrices are generated randomly. The result given in Fig. 8 is very close to that of algorithm SOBI-RO. V. CONCLUSION In a general sense, we prove that the decorrelation of the output of separation system can be used to construct sufficient conditions for stationary and nonstationary random signals based on some simple assumptions. We propose the batch-data and the on-line algorithms based on the decorrelation of the output of separation system and the nonstationarity of sources. These algorithms are robust to additive white noise when the time-delayed decorrelation and the nonstationarity are considered jointly. Simulation results show that these algorithms are effective.
[1] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal Process., vol. 24, pp. 1–10, 1991. [2] A. Cichockia, S. L. Shishkina, T. Mushac, Z. Leonowicza, T. Asadae, and T. Kurachic, “EEG filtering based on blind-source separation (BSS) for early detection of Alzheimers disease,” Clinical Neurophysiol., vol. 116, pp. 729–737, 2005. [3] Q. Lin, F. Yin, T. Mei, and H. Liang, “A blind-source separation based method for speech encryption,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 6, pp. 1320–1328, Jun. 2006. [4] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian signals,” Proc. Inst. Elect. Eng. F, vol. 140, no. 6, pp. 362–370, 1993. [5] P. Comon, “Independent component analysis, a new concept?,” Signal Process., vol. 36, pp. 287–314, 1994. [6] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind convolution,” Neural Comput., vol. 7, pp. 1129–1159, 1995. [7] J. F. Cardoso, “Blind signal separation: Statistical Principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, Oct. 1998. [8] S. Amari and A. Cichocki, “Adaptive blind signal processing—Neural network approaches,” Proc. IEEE, vol. 86, no. 10, pp. 2026–2048, 1998. [9] H. H. Yang, “Serial updating rule for blind separation derived from the method of scoring,” IEEE Trans. Signal Process., vol. 47, no. 8, pp. 2279–2285, 1999. [10] A. Hyvarinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Comput., vol. 9, pp. 1483–1492, 1997. [11] A. Cichocki, J. Karhunen, W. Kasprzak, and R. Vigrio, “Neural networks for blind separation with unknown number of sources,” Neurocomput., vol. 24, pp. 55–93, 1999. [12] V. Zarzoso and A. K. Nandi, “Adaptive blind-source separation for virtually any source probability density function,” IEEE Trans. Signal Process., vol. 48, no. 2, pp. 477–489, Feb. 2000. [13] H. Mathis, T. P. von Hoff, and M. Joho, “Blind separation of signals with mixed kurtosis signs using threshold activation functions,” IEEE Trans. Neural Netw., vol. 12, no. 3, pp. 618–624, May 2001. [14] T. Mei and F. Yin, “Decorrelation-based blind-source separation algorithm and convergence analysis,” in Proc. 4th Int. Symp. Independent Component Analysis and Blind Signal Separation (ICA2003), 2003, pp. 457–461. [15] L. Tong, Y. Inouye, and R. W. Liu, “Waveform-preserving blind estimation of multiple independent sources,” IEEE Trans. Signal Process., vol. 41, no. 7, pp. 2461–2470, Jul. 1993. [16] L. Tong and R. Liu, “Blind estimation of correlated source signals,” in Proc. ACSSC, 1990, vol. 1, pp. 258–262. [17] L. Tong, R. Liu, V. C. Soon, and Y. Huang, “Indeterminacy and identifiability of blind identication,” IEEE Trans. Circuits Syst., vol. 38, no. 5, pp. 499–509, May 1991. [18] L. Molgedey and H. G. Schuster, “Separation of a mixture of independent signals using time delayed correlations,” Phys. Rev. Lett., vol. 72, no. 23, pp. 3634–3637, 1994. [19] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind-source separation technique using second-order statistics,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–443, Feb. 1997. [20] K. Matsuoka, M. Ohya, and M. Kawamoto, “A neural net for blind separation of nonstationary signals,” Neural Netw., vol. 8, no. 3, pp. 411–419, 1995. [21] S. Choi, A. Cichocki, and S. Amari, “Equivariant nonstationary source separation,” Neural Netw., vol. 15, pp. 121–130, 2002. [22] S. Choi and A. Cichocki, “Blind separation of nonstationary sources in noisy mixtures,” Electron. Lett., vol. 36, no. 9, pp. 848–849, 2000. [23] S. Choi, A. Cichocki, and A. Belouchrani, “Blind separation of second-order nonstationary and temporally colored sources,” in Proc. 11th IEEE Process. Workshop, 2001, pp. 444–447. [24] C. Chang, Z. Ding, S. F. Yau, and F. H. Y. Chan, “A matrix-pencil approach to blind separation of colored nonstationary signals,” IEEE Trans. Signal Process., vol. 48, no. 3, pp. 900–907, 2000. [25] Y. Li, J. Wang, and A. Cichocki, “Blind source extraction from convolutive mixtures in ill-conditioned multi-input multi-output channels,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1814–1822, 2004. [26] S. C. Douglas, H. Sawada, and S. Makino, “Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters,” IEEE Trans. Speech, Audio Process., vol. 13, no. 1, pp. 92–104, Jan. 2005.
1158
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 5, MAY 2007
[27] T. Mei, J. Xi, F. Yin, A. Mertins, and J. F. Chicharo, “Blind source separation based on time-domain optimization of a frequency-domain independence criterion,” IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 6, pp. 2075–2085, Dec. 2006. [28] S. Amari, “Natural gradient works efficiently in learning,” Neural Comput., vol. 10, pp. 251–276, 1998. [29] J. F. Cardoso and B. H. Laheld, “Equivariant adaptive source separation,” IEEE Trans. Signal Process., vol. 44, no. 12, pp. 3017–3030, Dec. 1996. [30] R. A. Horn and C. R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 1985, p. 477. [31] S. Haykin, Adaptive Filter Theory, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1996, pp. 102, 363. [32] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mixtures of nonstationary sources,” IEEE Trans. Signal Process., vol. 49, no. 9, pp. 1837–1848, Sep. 2001. [33] A. Belouchrani and A. Cichocki, “Robust whitening procedure in blind-source separation context,” Electron. Lett., vol. 36, no. 24, pp. 2050–2051, 2000. [34] P. Georgiev and A. Cichocki, “Robust independent component analysis via time-delayed cumulant functions,” IEICE Trans. Fund., vol. E86-A, no. 3, 2003.
Fuliang Yin was born in Fushun, Liaoning, China, in 1962. He received the B.S. degree in electronic engineering and the M.S. degree in communications and electronic systems from Dalian University of Technology (DUT), Dalian, China, in 1984 and 1987, respectively. He joined the Department of Electronic Engineering, DUT, as a Lecturer in 1987 and became an Associate Professor in 1991. He has been a Professor at DUT since 1994, and the Dean of the School of Electronic and Information Engineering of DUT since 2000. His research interests include digital signal processing, speech processing, image processing and pattern recognition, digital communication, and integrated circuit design.
Tiemin Mei was born in Liaoning, China, on June 29, 1964. He received the B.S. degree in physics from Sun Yat-sen University, Guangzhou, China, the M.S. degree in biophysics from China Medical University, Shenyang, China, and the Ph.D. degree in signal and information processing from Dalian University of Technology, Dalian, China, in 1986, 1991, and 2006, respectively. He was a Visiting Fellow in the School of Electrical Computer and Telecommunications Engineering, The University of Wollongong, Wollongong, Australia, from 2004 to 2005. He has been an academic staff at Shenyang Institute of Technology (Shenyang Ligong University), Shenyan, China, since 1996. His current research interests include stochastic signal processing and speech processing.
Jun Wang (S’89–M’90–SM’93–F’07) received the B.S. degree in electrical engineering and an M.S. degree in systems engineering from Dalian University of Technology, China, and the Ph.D. degree in systems engineering from Case Western Reserve University, Cleveland, OH. He is a Professor in the Department of Mechanical and Automation Engineering at the Chinese University of Hong Kong (CUHK). Before he joined CUHK in 1995, he took various academic positions at the University of North Dakota, Case Western Reserve University, and Dalian University of Technology. His current research interests include neural networks and their engineering applications. Dr. Wang is an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS and IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: PART B. He is also a past president of Asia Pacific Neural Network Assembly and a past associate editor of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: PART C.