Semi-blind source extraction for fetal ... - Semantic Scholar

Report 2 Downloads 47 Views
ARTICLE IN PRESS

Neurocomputing 70 (2007) 1574–1581 www.elsevier.com/locate/neucom

Letters

Semi-blind source extraction for fetal electrocardiogram extraction by combining non-Gaussianity and time-correlation$ Zhenwei Shi, Changshui Zhang State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing 100084, PR China Received 5 April 2006; received in revised form 13 September 2006; accepted 5 October 2006 Communicated by D. Erdogmus Available online 10 November 2006

Abstract Fetal electrocardiogram (FECG) extraction is a vital issue in biomedical signal processing and analysis. A promising approach is blind (semi-blind) source extraction. In this paper, we develop an objective function for extraction of temporally correlated sources. The objective function is based on the non-Gaussianity and the autocorrelations of source signals, and it contains the well-known mean squared error objective function presented by Barros and Cichocki [Extraction of specific signals with temporal structure, Neural Comput. 13(9) (2001) 1995–2003] as a special example. Minimizing the objective function, we propose a source extraction algorithm. The algorithm extracts the clearer FECG as the first extracted signal and is very robust to the estimated error of time delay. It means that the algorithm is an appealing method which obtains an accurate and reliable FECG. r 2006 Elsevier B.V. All rights reserved. Keywords: Blind signal extraction (BSE); Blind source separation (BSS); Independent component analysis (ICA); Fetal electrocardiogram (FECG)

1. Introduction The extraction of fetal electrocardiogram (FECG) using non-invasive techniques is an important challenge in biomedical signal processing and analysis. The FECG contains important information about the health and condition of the fetus. However, it is a non-trivial task to obtain an accurate and reliable FECG by using several electrodes. Problems develop due to the facts, that FECG is always corrupted by various kinds of noise, such as the maternal electrocardiogram (MECG) with extreme high amplitude, respiration and stomach activity, thermal noise, noise from electrode-skin contact, etc. Generally speaking, two kinds of promising techniques have been proposed to address this problem. One is blind source separation (BSS) (or independent component $ The work was supported by NSFC (60605002, 60475001, 10571018) and by the Chinese Postdoctoral Science Foundation (2005038075). Corresponding author. Tel.: +86 10 627 96 872; fax:+86 10 627 86 911. E-mail addresses: [email protected] (Z. Shi), [email protected] (C. Zhang).

0925-2312/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2006.10.103

analysis, ICA) [2,8,4,10,13], which extracts all the source signals from observed signals. However, in many applications, extracting all the source signals from a large number of observed sensor signals could take a long time and only a very few source signals are subjects of interest. For this application, another technique, blind (semi-blind) signal extraction (BSE) [2] is a powerful candidate, since the BSE learning algorithms can extract a single source signal from a linear mixture of source signals. Our aim is to obtain the FECG as the first extracted signal, thus many BSE algorithms [2,8] are not suitable to extract the FECG. Among them, a simple algorithm developed by Barros and Cichocki [1] (for simplicity, we call it BC algorithm) provides a successful extraction of FECG. This algorithm needs a prior information about the optimal time delay at which the autocorrelation function of the desired FECG has a peak. But in practice the FECG extracted by this algorithm is very sensitive to the estimation error of the time delay and often contains noise contributions. To overcome these drawbacks, we proposed a general objective function

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

which extends the objective function used in the BC algorithm. A main point presented here is that we use a suitable function characterizing the statistical properties of the FECG. It is based on the non-Gaussianity and the autocorrelations of the FECG. We also derive a learning algorithm for optimizing the objective function. In this paper, a general objective function and a learning algorithm for BSE are proposed in Section 2. In Section 3, experiments on artificial data and real-world ECG data are presented. The discussion in Section 4 covers related work and some conclusions. 2. Proposed algorithm 2.1. Objective function We observe sensor signals xðtÞ ¼ ðx1 ðtÞ; . . . ; xn ðtÞÞT described by matrix equation: xðtÞ ¼ AsðtÞ, where A is an n  n unknown mixing matrix, sðtÞ ¼ ðs1 ðtÞ; . . . ; sn ðtÞÞT is a vector of unknown temporally correlated sources. We assume that desired source signal si (in case of the FECG extraction, si is the desired FECG) has specific temporal structures and can be modelled by linear autoregressive models has just one predicting term as: si ðtÞ ¼ asi ðt  tÞ þ dsi ðtÞ,

(1)

where dsi ðtÞ is zero-mean, i.i.d (white) time series called innovations and t is a specific time delay (the period length in samples in case of the FECG extraction) satisfying the relations Efsi ðt  tÞsi ðtÞga0. Because we want to extract only a desired source signal, for this purpose we design a single neural processing unit described as: yðtÞ ¼ wT xðtÞ, ^ ¼ yðtÞ  byðt  tÞ, yðtÞ

ð2Þ T

^ represents where w ¼ ðw1 ; . . . ; wn Þ is the weight vector, yðtÞ an innovation and b is a coefficient. Provided that the measured sensor signals x have already been followed by an n  n whitening filter V such that the ~ ¼ VxðtÞ are unit variance and uncorrecomponents of xðtÞ lated. We present the following constrained minimization problem: ~  bxðt ~  tÞÞÞg, ~ min EfGðyðtÞÞg ¼ EfGðwT ðxðtÞ

kwk¼1

(3)

where G is a differentiable function. The objective function combines both of non-Gaussianity and time-correlations of the desired source signal. In fact, assume that the source signal has no time dependencies, which implies that the innovation equals the signal itself. Then the objective function reduces to: ~ ~ min EfGðyðtÞÞg ¼ EfGðwT ðxðtÞÞÞg.

kwk¼1

(4)

This is the well-known ICA objective function based on the non-Gaussianity of the source signal [8]. In particular, if

1575

the function G in (4) is chosen as the negative log density of ~ we can find one the innovation (here: component y), independent component by minimizing the objective function in (4). Thus, the function G in (3) should be chosen based on the probability distribution of the innovation [7,8,11]. As a special case, assume that we choose GðuÞ ¼ u2 , the objective function in (3) is just the mean squared error function used in the BC algorithm [1]. However, the innovation of the FECG is super-Gaussian in general and we should choose a suitable function G for the FECG signal. In fact, we have the following stability theorem as independent component analysis [9,8]: Theorem 1. Assume that the input data follows the ~ ¼ VAsðtÞ linear mixture model with whitened data: xðtÞ where V is the whitening matrix. Furthermore, assume that the innovations dsi ðtÞ ¼ si ðtÞ  bi si ðt  tÞ ði ¼ 1; . . . ; nÞ are mutually statistically independent and have zero mean and unit variance (assume that bi , i ¼ 1; . . . ; n, are the known constants), and that G is a sufficiently smooth even non-quadratic function. Then the local ~  bi xðt ~  tÞÞÞg under the constraint minima of EfGðwT ðxðtÞ kwk ¼ 1 include one row of the mixing matrix VA such that the corresponding independent innovation dsi ðtÞ ¼ si ðtÞ  bsi ðt  tÞ satisfy Efdsi ðtÞgðdsi ðtÞÞ  g0 ðdsi ðtÞÞgo0,

(5) 0

where g is the derivative ofG, and g is the derivative of g. Proof. First, assume that the input data follows the linear ~ ¼ VAsðtÞ where V mixture model with whitened data: xðtÞ is the whitening matrix. Then the innovation processes dsi ðtÞ ¼ si ðtÞ  bi si ðt  tÞ and dx~ i ðtÞ ¼ x~ i ðtÞ  bi x~ i ðt  tÞ fol~ ¼ VAdsðtÞ. Thus, the theorem low the model as well: dxðtÞ stated here is a corollary of Theorem 1 in [9] (or the Theorem 8.1 in [8]), if the innovations dsi ðtÞ ¼ si ðtÞ  bi si ðt  tÞ ði ¼ 1; . . . ; nÞ are independent and have zero mean and unit variance, and G is a sufficiently smooth even non-quadratic function. From the theorem, we can see that if the innovations dsi ðtÞ ¼ si ðtÞ  bi si ðt  tÞ ði ¼ 1; . . . ; nÞ are mutually statistically independent, the objective function in (3) has the same stability analysis as the well-known ICA based on the non-Gaussianity of the innovation signal [8]. Thus, the even non-quadratic function G is suitable to extract the non-Gaussian innovations. In experiment section (Section 3) we show the advantages of the objective function and in discussion section (Section 4) we provide a possible reason why the mean squared error function has some limits in the FECG extraction and discuss the relation between the objective function and other methods. & 2.2. Learning algorithm Minimizing the objective function in (3), we derive a ~ gradient descent BSE algorithm. The gradient of EfGðyðtÞÞg

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

1576

with respect to w and b can be obtained as: ~ qEfGðyðtÞÞg ~  bxðt ~  tÞÞgðwT ðxðtÞ ~ ¼ EfðxðtÞ qw ~  tÞÞÞg,  bxðt ~ qEfGðyðtÞÞg ~ ~  tÞgðwT ðxðtÞ ¼  EfwT xðt qb ~  tÞÞÞg.  bxðt

suitable constant. For sub-Gaussian innovation, one could use gðuÞ ¼ u  tanhðuÞ or gðuÞ ¼ u3 , for example.

ð6Þ

3. Experimental results 3.1. Experiments on artificial data

ð7Þ

The function g is here the derivative of G. Thus, a gradient descent learning algorithm is obtained as: w

~  bxðt ~  tÞÞgðwT ðxðtÞ ~  bxðt ~  tÞÞÞgÞ, w  mw ðEfðxðtÞ

b

~  bxðt ~  tÞÞÞgÞ, ~  tÞgðwT ðxðtÞ b þ mb ðEfwT xðt

w

w=kwk,

ð8Þ

where mw and mb are learning rates. From Theorem 1, the function g should be chosen as in ordinary ICA, but according to the probability distribution of the innovation. If the innovation is super-Gaussian (such as FECG), gðuÞ ¼ signðuÞ is suitable. This could also be approximated by a smoother function gðuÞ ¼ tanhðauÞ, where aX1 is a

We generated six zero-mean and unit-variance source signals (2500 samples), shown in Fig. 1. From the top down, they were, a breathing artifact, an electrode artifact, a FECG whose sampling period was 112 (i.e., t ¼ 112), an MECG and two Gaussian signals, respectively. The observations x were generated by a 6  6 random mixing matrix. We ran the BC algorithm [1] and our algorithm. The nonlinearity was chosen as gðuÞ ¼ signðuÞ and the coefficient b was initialized to 1 in our algorithm. The learning rates were set to be 0:5ðmw Þ and 0:0005ðmb Þ. In order to measure the accuracy of extraction, we normalized the original FECG signal with ksk2 ¼ 1, and the estimated FECG signal with k~sk2 ¼ 1. The error was computed as k~s  sk2 . We gave here the mean values of 100 independent trials. The mean errors were 0.26 (our algorithm) and 0.68

Sources Sig1

5 0 -5 0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

Sig2

5 0 -5

Sig3

5 0 -5

Sig4

5 0 -5

Sig5

5 0 -5

Sig6

5 0 -5

Fig. 1. Six artificial signals used in the simulation. (Sig1) Breathing artifact. (Sig2) Electrode artifact. (Sig3) FECG. (Sig4) MECG. (Sig5-Sig6) Two Gaussian signals.

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

1577

Original FECG

5

0

-5 0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

FECG1

5

0

-5

FECG2

5

0

-5

Fig. 2. Comparison of the extracted FECGs in the simulations. (Top) The original FECG. (Middle) FECG1 is extracted by our algorithm. (Bottom) FECG2 is extracted by the BC algorithm.

(the BC algorithm). The superiority of our algorithm over the BC algorithm is straightforward. Fig. 2 presents the original FECG and the extracted FECGs by the two BSE algorithms. Compared with the BC algorithm, our algorithm obtained a clearer FECG. An important question to address is, how do the BSE algorithms perform during contractions, which is when the fetus signal is actually swamped under the significantly large background electrical activity? For addressing the problem, we let the variance of FECG is a very small value, say, 0.0001, and kept other original signals unit-variance. The performances of our algorithm and the BC algorithm were similar to the foregoing simulation. The mean errors were just 0.26 (our algorithm) and 0.68 (the BC algorithm) over 100 independent trials. 3.2. Experiments on real-world ECG data We performed further experiments on real-world ECG data—the well-known ECG measured from a pregnant women and distributed by De Moor [3]. Only 10 s of recordings (resampled at 250 Hz)1 are displayed in Fig. 3. We initialized the weight by w ¼ ð1; 0; 0; . . . ÞT in the two BSE algorithms. By carefully examining the autocorrela1

Although in De Moor’s homepage he assured that the sampling frequency was 500 Hz, Barros et al. believed that it was most likely to be 250 Hz [1].

tion of the sensor signal in channel 1 and using the same method in [1], one can estimate the optimal time delay t ¼ 112. The same nonlinearity and learning rates were chosen in our algorithm as used on the artificial data. The two FECGs extracted by the two BSE algorithms are shown in Fig. 4. In practice, sometimes error of the estimated optimal time delay cannot be avoided. We applied the two algorithms on the same ECG data, but adopted different time delay t. In real mixtures, the true FECG signal is unknown. However, for the sake of simple comparison, we would adopt the following method.2 Since we believe that the optimal desired FECG is be extracted at the time delay t ¼ 112 for the data, we can use the two estimated FECGs at t ¼ 112 by the two algorithms as the benchmarks. We computed the correlation coefficients between the estimated FECG at the different time delay t and the corresponding optimal estimated FECG (all estimated FECGs were normalized to be zero-mean and unitvariance). The results are plotted in Fig. 5. In the case, the correlation coefficient of the estimated FECG which is higher than 0.9 can be considered to achieve a good extraction level. 2 One can also adopt another method in the paper [10] which utilizes some suitable quantitative comparison (the estimated heart rate accuracy for the fetus given the ground truth using an independent measurement device/human expert).

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

1578

ECG8

ECG7

ECG6

ECG5

ECG4

ECG3

ECG2

ECG1

ECG signals

0

500

1000

1500

2000

2500

Fig. 3. ECG signals measured from a pregnant women.

Extracted FECGs

FECG1

5

0

-5 0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

FECG2

5

0

-5

Fig. 4. Comparison of the extracted FECGs from real-world ECG data at the optimal time delay t ¼ 112. FECG1 is extracted by our algorithm. FECG2 is extracted by the BC algorithm.

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

1579

1 0.9 0.8 BC algorithm our algorithm

Correlation Coefficient

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 100

105

110

115

120

125

130

135

140

145

150

Tau

Fig. 5. The correlation coefficients between the estimated FECG at the different time delay t and the corresponding optimal estimated FECG by the two algorithms.

From the results, we can see the BC algorithm is very sensitive to the estimation error of the time delay. The only satisfactory extracted FECG is just at t ¼ 112. But our algorithm works well even if the range of estimated error is quite large ðt ¼ 108; . . . ; 136; 139; . . . ; 145Þ. It implies that one can also make use of the fact that the fetal heart should strike every 0.5 second (corresponding to 125 samples for the data) [1]. It means we can simply use t ¼ 125 in our algorithm without examining the autocorrelation of the sensor signal. It is important in practice. For the sake of visible comparison, the extracted FECGs by the two algorithms are plotted in Fig. 6 ðt ¼ 108; 113; 125; 145Þ. 4. Discussions and conclusions 4.1. Discussions A method which is close to ours is complexity pursuit [7,11,12]. It combines both of non-Gaussianity and timecorrelations, which is an extension of projection pursuit to signals with time structure. The goal of complexity pursuit is to find projections of time series that have interesting structure, defined using criteria related to Kolmogoroff complexity [7,11]. The approximation of Kolmogoroff complexity as a contrast function of the ‘‘most interesting’’ direction w [11] has some similarity to the objective function in (3). However, the difference is obvious, too. The original complexity pursuit algorithms [7,11] are not performed as blind extraction of FECG. In fact, the methods are closely related to sequential blind separation

of time-dependent source signals and ICA. The time delay t is often taken to 1 in these algorithms. In the problem of FECG extraction, however, the time delay t (as a prior information) is estimated from the data. Thus, the original complexity pursuit algorithms are not suitable to extract the FECG as the first output. Even if we choose the time delay t close to the optimal one ðt ¼ 112Þ, the complexity pursuit algorithms are sensitive to the estimation error of the time delay too, because the AR coefficients are estimated simply by a least-squares method [7,11]. However, from the relation to complexity pursuit, we can obtain an insight why the FECG extracted by the BC algorithm [1] always contains more noise. The function G used in (3) is related to the probability distribution of the innovation [7,11], thus GðuÞ ¼ u2 as an objective function in [1] perhaps does not describe the probability distribution of the innovation of FECG signal well. In general, the innovation of FECG signal is a super-Gaussian signal, but using GðuÞ ¼ u2 is well justified only if the innovation is close to Gaussian. In fact, also from Theorem 1, we can see that if the innovations satisfy the independency conditions, the even non-quadratic function is suitable to extract the desired non-Gaussian innovations. It is also a possible reason why the BC algorithm is very sensitive to the estimation error of the time delay. Another important problem is that we have to choose a suitable time delay t for the desired FECG in some cases as the BC algorithm does. However, using a suitable cost function, we obtain satisfactory results even if we do not estimate the optimal time delay. We can only make use of

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

1580

F1

5 0 -5

F2

5 0 -5

F3

5 0 -5

F4

5 0 -5

F5

5 0 -5

F6

5 0 -5

F7

5 0 -5

F8

5 0 -5 0

500

1000

1500

2000

2500

Fig. 6. Comparison of the extracted FECGs when the estimation error of the time delay was introduced. (F1,F3,F5,F7) were extracted by our algorithm when t ¼ 108; 113; 125; 145, respectively. (F2,F4,F6,F8) were extracted by the BC algorithm when t ¼ 108; 113; 125; 145, respectively.

the fact that the fetal heart should strike every 0.5 s. Thus, the algorithm is an appealing method which obtains an accurate and reliable FECG in practice. In our paper, we emphasize the fact that we only address the blind source extraction. In our algorithm the first extracted signal (we extract only one signal) is the desired FECG, as it is an important feature of our algorithm. We do not address the BSS, which extracts (or separates) all the source signals from observed signals. In the very interesting paper of Hild et al. [6], the algorithm separates all the source signals from observed signals using an AR-MOG source model. Regretfully, even if we combine our algorithm with the AR-MOG source model and perform the blind source extraction, we also do not ensure that the first extracted signal is the desired FECG, as it may be the MECG or other noises. In our view, the suitable function is very important for extraction of the desired FECG. It should be chosen based on the statistics of the FECG and could not be chosen as the arbitrary function. We choose to use only the AR(1) model in this paper, in fact, arbitrary order AR models do not to improve the

results of the extracted FECG in our simulations. According to our experiments, the innovation of the FECG at the optimal time delay t has the property of sparseness, which means that only a small number of the innovation differ significantly from zero. From the view of neurocomputing, we conclude that maximizing the sparseness of the innovation might be a learning principle for extraction of FECG. The objective function (3) presented here should be better rooted and connected to the relevant theory in the field. A good start would be the very important paper of Gersho [5]. He outlined an approach to linear estimation or linear regression with non-mean square error criteria for random variables viewed as elements of normed linear spaces. Although these good results of the optimum estimator in the Lp space are found when the problem can be seen from supervised learning, they deserve a further study in order to find a deep explanation in unsupervised learning such as ICA and BSS. Such study is left for further work, as it falls beyond the scope of this short paper.

ARTICLE IN PRESS Z. Shi, C. Zhang / Neurocomputing 70 (2007) 1574–1581

4.2. Conclusions We have proposed a BSE algorithm for extraction of FECG signal. First, we obtain a general function which is able to describe the probability distribution of the innovation of the desired signal. Second, a clearer FECG signal can be obtained by the learning algorithm derived from the objective function. Furthermore, the proposed algorithm is very robust to the estimation error of the time delay, which makes it work well in practice. Due to the generality of the presented BSE algorithm, we believe it can be useful in other signal processing applications too. With our algorithm it is easy to construct flexible and suitable functions to solve various kinds of problems. Acknowledgments The authors wish to gratefully thank all anonymous reviewers and associate editor who provided insightful and helpful comments. References [1] A.K. Barros, A. Cichocki, Extraction of specific signals with temporal structure, Neural Comput. 13 (9) (2001) 1995–2003. [2] A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, Wiley, New York, 2002. [3] B. De Moor (Ed.), Daisy: Database for the Identification of Systems, hhttp://www.esat.kuleuven.ac.be/sista/daisyi. [4] M.G. Gafari, J.A. Chambers, Fetal electrocardiogram extraction by sequential source separation in the wavelet domain, IEEE Trans. Biomed. Eng. 52 (3) (2005) 390–400. [5] A. Gersho, Some aspects of linear estimation with non-mean square error criteria, Proceedings of the Third Asilomar Conference on Circuits and Systems, Pacific Grove, CA, December 1969. [6] K.E. Hild II, H.T. Attias, S.S. Nagarajan, An EM method for spatio–temporal blind source separation using an AR-MOG source model, in: J. Roscae, et al. (Eds), ICA 2006, Lecture Notes in Computer Science, vol. 3889, 2006, pp. 98–105.

1581

[7] A. Hyva¨rinen, Complexity pursuit: separating interesting components from time-series, Neural Comput. 13 (4) (2001) 883–898. [8] A. Hyva¨rinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley, New York, 2001. [9] A. Hyva¨rinen, E. Oja, Independent component analysis by general nonlinear Hebbian-like learning rules, Signal Process. 64 (3) (1998) 301–313. [10] D.E. Marossero, D. Erdogmus, N. Euliano, J.C. Principe, K.E. Hild II, Independent components analysis for fetal electrocardiogram extraction: a case for the data efficient mermaid algorithm, Proceedings of Neural Networks for Signal Processing, Toulouse, France, 2003, pp. 399–408 [11] Z. Shi, H. Tang, Y. Tang, A fast fixed-point algorithm for complexity pursuit, Neurocomputing 64 (2005) 529–536. [12] Z. Shi, C. Zhang, Gaussian moments for noisy complexity pursuit, Neurocomputing 69 (7–9) (2006) 917–921. [13] V. Zarzoso, A.K. Nandi, non-invasive fetal electrocardiogram extraction blind separation versus adaptive noise cancellation, IEEE Trans. Biomed. Eng. 48 (1) (2001) 12–18. Zhenwei Shi received his B.S. and M.S. degrees in Mathematics from the Inner Mongolia University, Huhhot, China, in 1999 and 2002, respectively. He received his Ph.D. degree in Mathematics from Dalian University of Technology, Dalian, China, in 2005. He is currently a Postdoctoral Researcher in the Department of Automation, Tsinghua University, Beijing, China. His research interests include blind signal processing, pattern recognition, machine learning and neuroinformatics. Changshui Zhang received his B.S. degree in Mathematics from Peking University, Beijing, China, in 1986, and Ph.D. degree from Department of Automation, Tsinghua University, Beijing, China, in 1992. He is currently a professor of Department of Automation, Tsinghua University. He is an Associate Editor of the journal Pattern Recognition. His interests include artificial intelligence, image processing, pattern recognition, machine learning and evolutionary computation, etc.