Digital Signal Processing 22 (2012) 17–33
Contents lists available at ScienceDirect
Digital Signal Processing www.elsevier.com/locate/dsp
Blind source separation with time series variational Bayes expectation maximization algorithm Shijun Sun a,∗ , Chenglin Peng b , Wensheng Hou b , Jun Zheng c , Yingtao Jiang d , Xiaolin Zheng b a
School of Physical Science and Technology, Zhanjiang Normal College, Zhanjiang Guangdong 524048, China Bioengineering College, Chongqing University, Chongqing 400030, China c Department of Computer Science, New Mexico Institute of Mining and Technology, Socorro, NM 87801, USA d Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, NV 89154, USA b
a r t i c l e
i n f o
a b s t r a c t
Article history: Available online 8 October 2010
This paper presents a variational Bayes expectation maximization algorithm for time series based on Attias’ variational Bayesian theory. The proposed algorithm is applied in the blind source separation (BSS) problem to estimate both the source signals and the mixing matrix for the optimal model structure. The distribution of the mixing matrix is assumed to be a matrix Gaussian distribution due to the correlation of its elements and the inverse covariance of the sensor noise is assumed to be Wishart distributed for the correlation between sensor noises. The mixture of Gaussian model is used to approximate the distribution of each independent source. The rules to update the posterior hyperparameters and the posterior of the model structure are obtained. The optimal model structure is selected as the one with largest posterior. The source signals and mixing matrix are estimated by applying LMS and MAP estimators to the posterior distributions of the hidden variables and the model parameters respectively for the optimal structure. The proposed algorithm is tested with synthetic data. The results show that: (1) the logarithm posterior of the model structure increases with the accuracy of the posterior mixing matrix; (2) the accuracies of the prior mixing matrix, the estimated mixing matrix, and the estimated source signals increase with the logarithm posterior of the model structure. This algorithm is applied to Magnetoencephalograph data to localize the source of the equivalent current dipoles. © 2010 Elsevier Inc. All rights reserved.
Keywords: Time series Variational Bayes Expectation maximization algorithm Blind source separation
1. Introduction Blind source separation (BSS) has received considerable attention recently because of its wide application in signal processing [1]. The goal of BSS is to extract the unobserved source signals from the observed mixed sensor signals. Under the assumption that the source signals are statistically mutually independent and mixed linearly and instantaneously, the BSS problem can be written as
y = Ax + u ,
(1)
where y is the K -dimensional observed sensor signal vector, x is the L-dimensional source signal vector, u is the K dimensional sensor noise vector and A is the K × L mixing matrix. The BSS problem is an inverse problem that the source signals and mixing matrix are estimated from the sensor signals. However, nothing is known about their properties except for their mutual statistical independence.
*
Corresponding author. E-mail address:
[email protected] (S. Sun).
1051-2004/$ – see front matter doi:10.1016/j.dsp.2010.09.005
© 2010
Elsevier Inc. All rights reserved.
18
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
A commonly used method for BSS problem is the independent component analysis (ICA) [2–4]. ICA does not consider sensor noise which generally should be taken into consideration. As the noise level increases, the performance of ICA algorithms deteriorates and the separation quality decreases. Moreover in many situations, there are a large number of sources but few sensors. In this case, one would like to lump the low-intensity sources together and treat them as noise while focus on the high-intensity ones. Another major drawback of ICA is that the mixing matrix and the magnitude of original source signals cannot be estimated due to its ambiguities and that the variances (energies), sign, and the order of the independent components cannot be determined [5]. Although some methods based on the ICA were proposed to overcome the ambiguities [6,7], the independent components and the mixing matrix of the ICA cannot represent the original source signals and mixing matrix respectively. In [8], the independent factor analysis (IFA) method was proposed to estimate the source signals and mixing matrix of the BSS problem. In IFA, the probability density function of the ith (i = 1, . . . , L ) component, xi , of the source signal vector was described as a mixture of ni Gaussians si = 1, . . . , ni with mean μisi , inverse variance νisi , and mixing proportions ωisi
p ( xi ) =
ni
ωisi N (xi ; μisi , νisi ),
(2)
s i =1
where N (•) denotes the density function of Gaussian distribution. The sensor noises are Gaussian noise with zero mean and inverse covariance Λ. The model parameters { A , Λ, ω, μ, ν } are computed by the generalized expectation maximization (EM) algorithm, which performs maximum likelihood (ML) estimation of the parameters, and the source signals are reconstructed from the sensor signals by optimal nonlinear estimators. As the number of sources increases, the E-step of IFA EM algorithm gradually becomes computationally expensive. Therefore, the factorized posteriors and the mean field approximation were introduced into the variational IFA for the cases with a large number of sources to reduce the computational complexity. However, since the EM algorithm is locally optimal and posterior may have multiple maxima, the estimated model parameters and source signals heavily depend on the initial values of model parameters [8]. Attias further proposed Variational Bayes (VB), a practical framework for Bayesian computations in graphical models presented in [9,10], which draws together variational ideas from intractable hidden variables models and Bayesian inference. For a fixed model, Y , H and Θ denote the visible (data) nodes, the hidden nodes (variables), and the parameters respectively. The variational Bayesian expectation maximization (VBEM) algorithm includes two steps. The first step is the variational Bayes expectation step (VBE-Step)
q( H |m) ∝ exp log p (Y , H |Θ, m) Θ ,
(3)
where the average, ·Θ , is taken with respect to q(Θ|m); and the second step is the variational Bayes maximization step (VBM-Step)
q(Θ|m) ∝ exp log p (Y , H |Θ, m)
H
p (Θ|m),
(4)
where the average, · H , is taken with respect to q( H |m), and p (Θ|m) is the parameter prior distribution. The structure posterior can be then written as
q(m) ∝ exp
log
p (Y , H |Θ, m) q( H |m)
− KL q(Θ|m) p (Θ|m) p (m),
(5)
H ,Θ
where the average, · H ,Θ , is taken with respect to q( H , Θ|m) = q( H |m)q(Θ|m), KL[q(Θ|m)] p (Θ|m) is the KL distance between the prior p (Θ|m) and the posterior q(Θ|m). In general, some information about the model parameters of the Bayesian network can be obtained from prior knowledge and utilized for estimation purpose. The more accurate the prior distribution is, the more accurately the parameters can be estimated. In other words, the accuracy of the estimation depends on the prior information rather than the initial values of the parameters. Due to its superior computation performance, the VBEM algorithm has been widely applied to solve the inverse problem. However, there are still some practical issues in applying VBEM algorithm: (1) In the rules of updating posterior parameters, the sum over the time series is used [11–14] which may cause the overflow problem if the length of the time series is long. (2) The covariance of the sensor noise is assumed to be diagonal [12,15,16]. In realistic scenarios, because the noises are generally correlated across sensors, it is unreasonable to assume that the covariance of the noise is diagonal. (3) The elements of the mixing matrix are assumed to be independent [12,17]. Actually, the mixing matrix can be derived from some physical mechanisms and the elements of the matrix are normally correlated. In this paper, we propose a time series VBEM algorithm to attack those issues. In time series VBEM algorithm, the data instances of the time series are averaged instead of being summed. The sensor noise inverse covariance and the mixing matrix
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
19
are assumed to be Wishart distributed and matrix Gaussian distributed respectively to take into account the correlations of the sensor noise and the elements of the mixing matrix. The rest of the paper is organized as follows. In Section 2, we present the details of the time series VBEM algorithm. In Section 3, we show how to apply the time series VBEM algorithm to the BSS problem. Section 4 presents the experimental results of the algorithm. Finally, Section 5 discusses the algorithm and concludes the paper. 2. Time series VBEM algorithm In this section, we present the detailed derivation of the time series VBEM algorithm based on the variational Bayesian framework for graphical models of [10]. Let Y = { y 1 , y 2 , . . . , y T } denote the visible (data) nodes, H = {h1 , h2 , . . . , h T } denote the hidden nodes, Θ = {θ 1 , θ 2 , . . . , θ T } denote the parameters, and M = {m1 , m2 , . . . , m T } denote the model structures, where t = 1, 2, . . . , T runs over the time instances. The logarithmic marginal likelihood of the data nodes can be approximated using Jensen’s inequality,
log p (Y ) = log
dH dΘ q( H , Θ, M |Y )
M
F≡
p (Y , H , Θ, M ) q( H , Θ, M |Y )
dH dΘ q( H , Θ, M |Y ) log
M
p (Y , H , Θ, M ) q( H , Θ, M |Y )
(6)
,
where q( H , Θ, M |Y ) is a variational posterior distribution given the data set Y . If q( H , Θ, M |Y ) is equal to the true posterior, the inequality Eq. (6) becomes an equality. However, the computation of the true posterior is intractable and approximations must be made. Consider that q( H , Θ, M |Y ) is restricted to the following factorized form,
q( H , Θ, M |Y ) = q( H | M , Y )q(Θ| M , Y )q( M |Y ),
(7)
where the parameters and hidden nodes are independent. Thus, F is a functional over q( H | M , Y ), q(Θ| M , Y ), and q( M |Y ). Because of the independence and identical distribution for each time instance, the distributions in Eqs. (6)–(7) can be expressed as
p (Y , H , Θ, M ) =
T
p yt , ht , θ t , mt ,
(8)
t =1
q( H | M , Y ) =
T
q ht |mt , Y ,
(9)
t =1
q(Θ| M , Y ) =
T
q θ t |mt , Y ,
(10)
t =1
q( M |Y ) =
T
q mt |Y .
(11)
t =1
Hereafter, we shall always omit Y in the variational posterior q(·|Y ) for short. With the help of Eqs. (6)–(11), we obtain
F=
···
dh1 dθ 1 · · · dh T dθ t q h1 |m1 q θ 1 |m1 q m1
···
mT
m1
T × q h T |m T q θ T |m T q m T log t =1
p ( yt , ht , θ t , mt )
(12)
q(ht |mt )q(θ t |mt )q(mt )
which can be simplified with the Bayesian rule, i.e.
F=
T
dht dθ t q ht |mt q θ t |mt q mt log p yt , ht |θ t , mt
t =1 mt
+
T
dθ t q θ t |mt q mt log p θ t |mt
t =1 mt
+
T
q mt log p mt −
t =1 mt
T
q mt log q mt
t =1
20
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
−
T
dht q ht |mt q mt log q ht |mt
t =1 m T
−
T
dθ t q θ t |mt q mt log q θ t |mt ,
(13)
t =1 m T
where the integration over the redundant variational posterior has been implicitly done, p (θ t |mt ) is the prior distribution over the parameters θ t given structure mt , and p (mt ) is the prior distribution over the structure mt . If the parameters and the structures do not depend on the time instance, θ 1 = θ 2 , . . . , θ T = θ , m1 = m2 = · · · = m T = m, and Eq. (13) can be rewritten as,
F=
T
dht dθ q ht |m q(θ|m)q(m) log p yt , ht |θ, m
t =1 m
+T
dθ q(θ|m)q(m) log p (θ|m) + T
m
−T
q(m) log p (m)
m
q(m) log q(m) −
T
dht q ht |m q(m) log q ht |m
t =1 m
m
−T
dθ q(θ|m)q(m) log q(θ|m).
(14)
m
To find the optimal posterior under the normalization condition, 1 T
T
t =1
m
dht q(ht |m)q(m) = 1 (because
F˜ = F − λ1
T
m
dht q ht |m q(m) − λ2
t =1 m
m q (m) log p (m)
= 1,
m
dθ q(θ|m)q(m) = 1, and
dht q(ht |m)q(m) = 1), we define the following Lagrange function,
dθ q(θ|m)q(m) − λ3
m
q(m)
(15)
m
where λ1 , λ2 , and λ3 are Lagrange multipliers. In the VBE-Step, we compute the variational posterior, q(ht |m), over the hidden nodes by solving δ F˜ /δq(ht |m) = 0 to get
log q ht |m =
dθ q(θ|m) log p yt , ht |θ, m − log z1 ,
(16)
where z1 is the normalization constant. In the VBM-Step, we compute the variational posterior over the parameters by solving δ F˜ /δq(θ|m) = 0 to get
log q(θ|m) =
T 1
T
dht q ht |m log p yt , ht |θ, m + log p (θ|m) − log z2 ,
(17)
t =1
where z2 is the normalization constant. Finally, we compute the posterior over the structure by solving δ F˜ /δq(m) = 0 to get
log q(m) =
T
1
T
−
dht dθ q ht |m q(θ|m) log p yt , ht |θ, m
t =1 T 1
T
dht q ht |m log q ht |m − KL q(θ|m) p (θ|m)
t =1
+ log p (m) − log z3 ,
(18)
where z3 is the normalization constant. It should be noticed that the averaging rather than summing over time series is used in Eqs. (17), (18). Based on the above analysis, we can estimate all model parameters and the hidden variables via the corresponding posterior distributions for each fixed model structure. The optimal model structure is the one that has the largest model posterior. For a fixed model structure, it is necessary to select the posterior distributions over the parameters and the hidden variables. The time series VBEM algorithm becomes the rule of updating the posterior hyperparameters.
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
21
Fig. 1. The Bayesian network of probabilistic model in Eq. (1) at time instance t.
3. Estimating sources and mixing matrix in BSS with time series VBEM algorithm In this section, we apply the time series VBEM algorithm to estimate the original source signals and the mixing matrix from the sensor signals in the BSS problem. We start with the Bayesian networks based on the probabilistic models of Eq. (1). By computing the joint conditional density under the given parameters and hidden nodes and selecting the prior and posterior distributions over parameters, we deduce certain rules to update the posterior hyperparameters for estimating the parameters and the original source signals, and to calculate the posterior over the model structure. This method is different with variational expectation maximization to time series analysis [18–20]. 3.1. Bayesian networks The Bayesian network of Eq. (1) at time instance t is shown in Fig. 1. Similar to IFA [8], the density of each independent source signal is approximated by the mixture of Gaussian (MOG) model (as shown by Eq. (2)) [21]. The hidden variables ht at time instance t includes the source signal vector xt and the component st in MOG model, i.e., ht = {xt , st }, where st = {st1 , . . . , stL } and xt = {xt1 , . . . , xtL }. The parameter θ includes the mixing matrix A, the inverse covariance Λ, of sensor noise and the source MOG parameters {ω, μ, ν }, i.e., θ = { A , Λ, ω, μ, ν }, where ω stands for the mixing proportion, μ stands for the means, and ν stands for the inverse variances. Notice that μ depends on ν as required by the variational Bayesian MOG model [10]. 3.2. Joint conditional probabilistic density For time series VBEM algorithm shown by Eqs. (16)–(18), one can compute the joint conditional density by using Bayes’ rule,
p yt , xt , st | A , Λ, ω, μ, ν = p yt |xt , A , Λ p xt , st |ω, μ, ν .
(19)
Since the density of the sensor noise is Gaussian distribution with zero mean and inverse covariance Λ, the probability for generating a particular sensor vector is
Λ 1/2 t 1 t t T t p y |x , A , Λ = exp − y − Ax Λ y − Ax . 2π 2
t
t
(20)
According to the MOG model, the conditional distribution can be written as
t
t
p x , s |ω, μ, ν =
L
ωisi
i =1
νisi 2π
1/2
exp −
1 2
νisi
xti
− μisi
2
.
(21)
Hereafter, we shall always omit m in the variational posterior q(·) for short. The corresponding variational posterior distriˆ tis , μ ˆ tis and νˆ ist , over the hidden variables can be chosen in the same form as Eq. (21), bution, with the parameters ω i
q xt , st =
L i =1
ωˆ tisi
ˆ t 1/2
νisi 2π
i
i
exp −
1 2
νˆ ist i xti − μˆ tisi
.
(22)
22
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
3.3. Prior and posterior distributions over model parameters For the Bayesian networks shown in Fig. 1, the prior distributions over parameters can be expressed as product, p (θ) = p ( A ) p (Λ) p (ω) p (μ, ν ). Since the elements of the mixing matrix are correlated, the prior distribution over the mixing matrix is assumed to be matrix Gaussian distribution with the hyperparameters, i.e., the mean a and inverse covariance b, which is a K × L × K × L 4th-order tensor,
L K L K mat(b) 1/2 1 p( A) = exp − ( A ji − a ji )b jikl ( Akl − akl ) , 2π 2
(23)
j =1 i =1 k =1 l =1
where mat(b) is a K L × K L matrix with matrix element (mat(b))( j −1)∗ L +i ,(k−1)∗ L +l = b jikl . The prior distribution over the inverse covariance Λ of the sensor noise is assumed to be Wishart distribution with hyperparameters, i.e., the degree of freedom ξ and covariance η ,
p (Λ) =
1 zξ,η
|Λ|(ξ − K −1)/2 exp − tr η−1 Λ /2 ,
(24)
K
where zξ,Λ = 2ξ K /2 π K ( K −1)/4 |η|ξ/2 j =1 Γ ((ξ − j + 1)/2). The prior distribution over the mixing proportions MOG model is assumed to be symmetrical Dirichlet distribution,
ω of the source
n n λis −1 L i i ωisi i p (ω) = λisi Γ . Γ (λisi ) s i =1
i =1
(25)
s i =1
μ and ν of the source MOG model is assumed to be a Gaussian-Gamma distribution, αis −1 ni L νisi i γisi νisi 1/2 νisi 1 2 − . p (μ, ν ) = exp − γisi νisi (μisi − misi ) exp αis 2π 2 βisi Γ (αisi )βisi i i =1 s i =1
The prior distributions over
(26)
The variational posterior distributions, with the hyperparameters marked by the notation ‘ˆ’, over the model parameters are chosen to be the same distribution families as the prior distributions,
L K L K mat(bˆ ) 1/2 1 ˆ q( A ) = exp − ( A ji − aˆ ji )b jikl ( Akl − aˆ kl ) , (2π ) 2
(27)
j =1 i =1 k =1 l =1
q(Λ) =
1 zξˆ ,ηˆ
ˆ |Λ|(ξ − K −1)/2 exp − tr(ηˆ −1 Λ)/2 ,
(28)
n n λˆ is −1 L i i ωisi i ˆ λisi q(ω) = , Γ Γ (λˆ isi ) s =1 s =1 i =1 i
q(μ, ν ) =
(29)
i
ni L γˆisi νisi 1/2 2π
i =1 s i =1
ˆ
exp −
1 2
γˆisi νisi (μisi − mˆ isi )2
K
ˆ
αˆ is −1
νisi i
αˆ is
Γ (αˆ isi )βˆisi i
exp −
νisi βˆisi
,
(30)
where zξˆ ,ηˆ = 2ξ K /2 π K ( K −1)/4 |ηˆ |ξ /2 j =1 Γ ((ξˆ − j + 1)/2). The variational posterior distributions over the model parameters, being similar to the prior, can be expressed as product, q(θ) = q( A )q(Λ)q(ω)q(μ, ν ). 3.4. Mean-field equations With the help of Eqs. (16), (19)–(30), one can obtain the variational posterior distributions over the hidden variables at ˆ tis , μ ˆ tis and νˆ ist can be written in the following form: time instance t in the time series at the VBE-Step, i.e. the parameters ω i
i
νˆ ist i = A¯ ii + νisi , νˆ ist i μˆ tisi +
nl L
(31)
ˆ lst μ ˆ lst = νisi μisi + A¯ il ω l
l(=i ) sl =1
ˆ tisi = log ωisi + log ω ˆ tis i
ω ωˆ tisi ← ni
s i =1
ωˆ tisi
i
,
1 2
l
A T Λ yt i ,
ˆ ist i − νˆ ist i μˆ t2 isi − log ν
1 2
(32)
νisi μ2isi − log νisi ,
(33) (34)
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
23
where A¯ = A T Λ A whose explicit forms can be found in Appendix A. In deriving Eq. (32), we have implicitly adopted t ˆ tis = ωˆ isi with in Eq. (31) does not depend on the time [8]. After initializing ω the mean field approximation. Notice that νˆ is i
the method shown by Eq. (34), Eq. (32) can be expressed as a linear
L
i =1 ni
×
L
i
i =1 ni
ˆ tis system and can be solved for μ
i
ˆ tis are obtained from Eqs. (33), (34) and these values are then substituted back to through the standard method. The new ω i Eq. (32). All these procedures shall be repeated until convergence. 3.5. Update posterior hyperparameters Based on Eqs. (17), (19)–(30), the variational posterior distributions over the model parameters can be computed at the VBM-Step. The hyperparameters of the posterior distributions can be written as,
⎧ T ⎪ t t ⎪ ˆ jikl = b i jkl + Λ jk 1 ⎪ b xi xl x,s , ⎪ ⎪ ⎨ T t =1
(35)
L L K K K T ⎪ ⎪ 1 t t ⎪ ⎪ bˆ jikl aˆ kl = b jikl akl + Λ jk yk xi x,s , ⎪ ⎩ T k =1 l =1
k =1 l =1
⎧ ⎪ ξˆ = ξ + 1, ⎪ ⎨ ⎪ ⎪ ⎩ ηˆ
−1
=η
−1
+
λˆ isi = λisi +
T 1
T
T
t
y y
t =1
T 1 t =1
t =1
k =1
tT
T T T 1 t 1 t tT T 1 t t T − A x x,s yt T − y x x,s A + A x x x,s A T , T T T t =1
t =1
ωˆ tisi ,
(36)
t =1
(37)
⎧ T ⎪ 1 t ⎪ ⎪ ˆ γ = γ + ωˆ isi , ⎪ is isi ⎪ ⎨ i T t =1
(38)
T ⎪ ⎪ 1 t T ⎪ ⎪ ˆ ˆ γ m = γ m + ωˆ isi xi x|s , ⎪ is is is is i i ⎩ i i T t =1 ⎧ T ⎪ 1 t ⎪ ⎪ ˆ α = α + ωˆ isi , ⎪ is is i i ⎪ ⎨ T t =1
T ⎪ ⎪ 1 1 1 1 1 1 t t2 ⎪ 2 2 ⎪ ˆ ˆ = + γ m − γ m + ωˆ isi xi x|s . ⎪ is is ⎩ ˆ βisi 2 i isi 2 i isi 2T βisi
(39)
t =1
The average is taken with respect to the posterior distributions over the parameters or the hidden variables, whose explicit ˆ isi using the standard method. form can be found in the appendix. Eq. (35) is a linear K × L system and can be solved for α Eqs. (35)–(39) show the necessity of averaging rather than summing over the data instances. 3.6. Estimate original source signals and parameters
ˆ,m ˆ, To estimate the source signals and the parameters, we first initialize the posterior hyperparameters, {ˆa, bˆ , ξˆ , ηˆ , λ t t ˆ , and then compute {ωˆ t , μ ˆ ˆ γˆ , αˆ , β} , ν } by Eqs. (31)–(34) at each time instance t and posterior hyperparameters by isi isi isi ˆ,m ˆ are substituted back into Eqs. (31)–(34), and the procedure shall be repeated ˆ , γˆ , αˆ , β} Eqs. (35)–(39). The new {ˆa, bˆ , ξˆ , ηˆ , λ until convergence. At time instance t, we can estimate the original source signals by applying the least mean squares (LMS) estimators to the posterior distribution over the hidden nodes,
xˆ iT =
ni s i =1
ωˆ tisi μˆ tisi .
(40)
The model parameters can be estimated by applying the maximum a posteriori probability (MAP) estimators to the posterior distributions over the parameters,
Aˆ ji = aˆ ji ,
(41)
Λˆ = (ξˆ − K − 1)ηˆ ,
(42)
24
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
Fig. 2. Flow chart for estimating sources signals and parameters by time series VBEM algorithm.
ωˆ isi = ni
λˆ isi − 1 , (λˆ isi − 1)
(43)
s i =1
μˆ isi = mˆ isi ,
νˆ isi = αˆ isi
(44)
1 βˆisi . −
(45)
2
The flow chart for estimating the sources signals and the parameters with the time series VBEM algorithm is shown in Fig. 2. 3.7. Estimate optimal model structure Upon obtaining the posterior distributions over the parameters and the hidden variables, we can compute the posterior over a fixed structure m by substituting Eqs. (19)–(30) into Eq. (18),
log q(m) =
ni L T 1
T
t =1 i =1 s i =1
1
1
2
2
ωˆ tisi log ωisi + log νisi − νisi xt2 i x|s
1 1 1 + νisi μisi xti x|s − νisi μ2isi − log ωˆ tisi − log νˆ ist i + 2
+ −
1 2 1 2
2
1 1 log |Λ| − tr Λ 2
tr
T
A Λ A
T
T 1
T
T t =1
t tT
t
y y
tT
− 2Λ A
2
1 T
T
t
t =1
x
x,s
y
tT
x x x,s − KL q( A ) p ( A )
t =1
− KL q(Λ) p (Λ) − KL q(ω) p (ω) − KL q(μ, ν ) p (μ, ν ) + log p (m) − log z3 ,
(46)
where the average is taken with respect to the posterior distributions over the parameters or the hidden variables, KL[q(·) p (·)] is the KL distance between p (·) and q(·), and z3 is the normalization constant. One can select the optimal model structure over the considered structures, which corresponds to the largest q(m), i.e. the so-called the optimal q(m). 4. Simulations and results To show the performance of the proposed time series VBEM algorithm in the BSS problem, we implement the algorithm with Visual C++, and apply it to both the synthetic and real MEG data. 4.1. Synthetic data To obtain the synthetic data, we assumed the true source signals and mixing matrix. The sensor noises are assumed to be Gaussian distributed with zero mean and a known inverse covariance. The sensor signals can be then obtained by Eq. (1). We can estimate the mixing matrix and the original source signals for the sensor signals at several groups of prior hyperparameters. In this case, the prior hyperparameters are assumed as the model structure. We compute the structure posterior with Eq. (46). The optimal mixing matrix and original source signals are estimated in the optimal structure with the largest structure posterior.
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
25
Fig. 3. Simulation for synthetic data. (a) The true source signals. (b) The sensor signals. (c) The dependence of the logarithm model structure posterior log q(m) on Accuracy( Aˆ , A true ). (d) The dependences of the Accuracy( Aˆ , A true ), Accuracy( A prior , A true ), and Accuracy(ˆx, xtrue ) on the logarithm model structure posterior log q(m). (e) The estimated original source signals denoised by Sym8 wavelet package at 3 level, for the optimal model structure, and true source signals are laid to overlap.
We define a quantity Accuracy( A , A ) to represent the accuracy of the estimated mixing matrix A compared with the true mixing matrix A, i.e.
K
j =1
Accuracy( A , A ) = 1 −
L
K
i =1 ( A ji
j =1
L
− A ji )2
2 i =1 ( A ji )
(47)
.
We also define another quantity Accuracy(x, x ) to represent the accuracy of the optimal estimation of the source signals x
compared with the true source signals x,
Accuracy(x, x ) =
T T
t =1
L
t =1
L
t 2 i =1 ( x i )
t xt
i =1 x i
T
t =1
i
L
t 2 i =1 ( x i )
.
(48)
The true source signals are shown in Fig. 3(a). The true mixing matrix is created with 8 rows and 4 columns. The sensory signals are shown in Fig. 3(b). The source MOG model parameters are estimated by using the EM algorithm [22,23]. We test 35 groups of prior hyperparameters to estimate the mixing matrix and the original source signals to search dependence of the model structure posterior and the estimation accuracy on the prior hyperparameters. Assume the prior is uniform, i.e., p (m) = 1/35, m = 1, . . . , 35. For each prior hyperparameters group, we perform the computation according to the flow chart of Fig. 2. As shown in Fig. 3(c), the value of log q(m) increases with the increasing of the accuracy of the estimated mixing matrix Accuracy( Aˆ , A true ). On the other hand, as shown in Fig. 3(d), the larger the logarithm structure posterior log q(m) is, the larger the Accuracy( Aˆ , A true ), Accuracy( A prior , A true ) and Accuracy(ˆx, xtrue ) can be achieved, where A prior is the prior mixing matrix estimated from the prior distribution of the mixing matrix. Fig. 3(e) indicates that Accuracy( Aˆ , A true ) is larger than Accuracy( A prior , A true ), which proves that Aˆ is always closer to A true than A prior . So we can select the optimal model structure with the largest structure posterior to estimate the mixing matrix and the original source signals. Because the noise present in the sensors, the estimated sources must be denoised. The estimated source signals denoised by Sym8 wavelet package [24–26] at level 3 are shown in Fig. 3(e).
26
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
Fig. 3. (continued)
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
27
Fig. 3. (continued)
4.2. Real data We apply the presented algorithm to the MEG source localization. The real data are MEG evoked by press key action of a normal person’s the index finger from CTF MEG system in Third Military Medical University. The artifacts are removed from MEG by using ICA. In equivalent current dipole (ECD) model [27–29] of Magnetoencephalograph (MEG), the magnetic field can be calculated by the lead field equation,
B = F J,
(49)
28
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
where F ≡ [ F θ F φ ] is the lead field matrix, J ≡ [ J θ J φ ] T is the amplitude of ECDs,
F θ ji ≡
μ ri eφ i · e R j , j − r i |3 4π | R
F φ ji ≡ −
μ ri eθ i · e R j , j − r i |3 4π | R
(50)
j is the location of the jth superconducting quantum interfere device (SQUID), e R j is the radial direction at R j , r i where R is the location of the ith ECD. { eri , eθ i , eφ i } are the basis vectors of spherical polar coordinates at r i , eφ i · e R j = − N xj sin φi + N y j cos φi ,
(51)
eθ i · e R j = N xj cos θi cos φi + N y j cos θi sin φi − N zj sin θi ,
(52)
{ N xj , N y j , N zj } are the direction cosines of R j . The lead field matrix is determined by the location of ECDs and the SQUIDs. In the inverse problem of MEG, the location of ECDs is estimated from the magnetic field. This can be considered as a BBS problem. The amplitude of ECDs is considered as source, the lead field matrix as the mixing matrix, and the magnetic field as the mixing signals. We use the presented algorithm to estimate the lead field matrix from the magnetic field under some prior hyperparameters. Once the lead field matrix is obtained, the location of ECDs can be calculated by using BFGS variable metric method [30,31]. Because the location of the ith ECD only depends on F θ ji and F φ ji , the location of the ith ECD can be computed from the ith column of F θ and F φ . So the complexity can be considerably reduced. To avoid calculating trigonometric function in the Broyden–Fletcher–Goldfarb–Shanno (BFGS) variable metric method, we introduce parameters,
w ri = r i ,
w θ i = tan
θi 2
,
w φ i = tan
φi 2
(53)
.
The element of lead field matrix in Eq. (50) can be rewritten as
μ w ri N θ j ( w φ i ) , j − r i |3 4π | R μ w ri N φ j ( w θ i , w φ i ) , F φ ji = 4π | R j − r i |3 F θ ji =
(54) (55)
where
N θ j ( w φ i ) = − N xj s( w φ i ) + N y j c ( w φ i ),
(56)
N φ j ( w θ i , w φ i ) = − N xj c ( w θ i )c ( w φ i ) − N y j c ( w θ i )s( w φ i ) + N zj s( w θ i ),
(57)
2 2 2 | R j − r i | = R xj − w ri s( w θ i )c ( w φ i ) + R y j − w ri s( w θ i )s( w φ i ) + R zj − w ri c ( w φ i ) , 2
(58)
where s(x) ≡ 2x/(1 + x2 ), c (x) ≡ (1 − x2 )/(1 + x2 ). The target function of BFGS variable metric method is selected as the Euclidian distance between the ith column of the estimated mixing matrix ( Fˆ θ , Fˆ φ ), and lead field matrix, ( F θ ji , F φ ji ),
D i ( w ri , w θ i , w φ i ) =
K
( F θ ji − Fˆ θ ji )2 + ( F φ ji − Fˆ φ ji )2
(59)
j =1
where K is the number of SQUIDs. By minimizing the Euclidian distance, D i ( w ri , w θ i , w φ i ), the parameters, { w ri , w θ i , w φ i }, of the ith ECD can be obtained, and then converted into the spherical polar coordinates and the Cartesian coordinates. This MEG source localization is shown as Fig. 4. The result indicates that the cerebral activities mainly concentrated in the posterior part of contralateral central channels, corresponding to the motor cortex in precentral gyrus. The result is close to the SAM algorithm of CTF MEG system, shown in Fig. 5. 5. Discussion and conclusion In this section, we discuss the time complexity, the ability of separating the original source signals and estimating the mixing matrix of the proposed algorithm and draw the conclusions. 5.1. Time complexity
L 3 i =1 ni ) ) for each model structure, L 3 iteration, and time instance. So the time complexity is O ( N m N it T ( i =1 ni ) ), where N m stands for the number of mode The time complexity of this algorithm rests with Eq. (32), which is executed with O ((
structures, N it is the numbers of iterations. The time complexity is same as the variational IFA [8].
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
29
Fig. 4. MEG source localization. All green ‘+’ denote SQUID, the red points denote location of ECD. (a) location of first ECD; (b) location of second ECD; (c) location of third ECD. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
5.2. Performance of blind source separation After extensive tests of the proposed algorithm, we find that the more accurate the prior distributions of the original source signals are, the better the performance of the blind source separation is. The algorithm is insensitive to the hyperparameters {λsi , msi , γsi , αsi , βsi } of the prior distributions of the source MOG model parameters and the hyperparameters ξ and η of the prior distribution of the inverse covariance of the sensor noise. But it is sensitive to the hyperparameters a and b of the prior distribution of the mixing matrix. 5.3. Performance of estimating mixing matrix As shown in Fig. 3(e), the accuracy of estimating the mixing matrix largely depends on the hyperparameters a and b of the prior distribution of the mixing matrix. The difference between the true mixing matrix and the estimated mixing matrix results from the essence of the inverse problem. The performance of estimating mixing matrix depends on the prior knowledge of the mixing process. 5.4. Comparison with ICA and variational IFA In respect of independent source separation, if the noise are weak compared with the signals and the densities of the independent sources are selected properly in ICA, the time series VBEM algorithm has no distinct advantage over ICA. However, although the speed of ICA is faster than that of the proposed algorithm, the independent components and the unmixing matrix extracted by ICA cannot represent the original source signals and mixing matrix. In some real situations, e.g. the localization of dipole-based EEG/MEG, the original source signals and mixing matrix play very important roles. In variational IFA, because the prior distributions of the parameters are not used, the estimated source signals and the model parameters largely depend on the initial values of the model parameters, and the convergence speed is low. While in the time series VBEM algorithm, the optimal model parameters are found in the space determined by the prior distributions of the parameters such that the estimated parameters depend on the prior hyperparameters rather than the initial values of the posterior hyperparameters. Thus fast convergence can be obtained. Moreover, the time series VBEM algorithm can
30
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
Fig. 5. MEG source localization of SAM algorithm in MEG151 system. (a) Location of first ECD; (b) location of second ECD; (c) location of third ECD.
determine the optimal model structure by the posteriors of the model structures, while variational IFA cannot. Within the general Bayesian theory, the prior hyperparameters, which are derived from the prior knowledge accumulated in the research, are the key to find the optimal parameters. 5.5. Conclusions In this paper, we have presented the time series VBEM algorithm based on the VB theory, and applied it to the BSS problem for estimating the original source signals and the mixing matrix. In the proposed algorithm, the averaging rather than the summing over the time series is performed. To capture the correlation between the sensor noise, the sensor noise inverse covariance is assumed to be Wishart distributed. In addition, the mixing matrix is assumed to be matrix Gaussian distributed respectively to take into account the correlation of its elements. The time series VBEM algorithm consists of computing the posterior distributions of the hidden variables (the time series VBE-Step), the posterior distributions of the model parameters (the time series VBM-Step), and the posterior distribution of the model structure. The optimal model structure is selected as the one with largest posterior and the source signals and mixing matrix are the estimated by applying LMS and MAP estimators to the posterior distributions of the hidden variables and the model parameters respectively for the optimal structure. The performance of the proposed algorithm is tested with both synthetic data and real data. The results show that the logarithm posterior of the model structure increases with the accuracy of the mixing matrix and the accuracies of the prior mixing matrix. The accuracy of the estimated mixing matrix, and the estimated source signals increase with the logarithm posterior of the model structure. This algorithm is applied to MEG data to localize the source of the equivalent current dipoles. The predicted results are comparable to those given by the SAM algorithm of CTF MEG system.
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
31
Acknowledgments The present research was supported by the National Natural Foundation of China (NSFC, No. 30770546) and Chongqing Natural Science Foundation (No. 2006BB2043, No. 2007BB5148). The authors would like to thank Dr. Xing-Gang Wu for many valuable discussions. Appendix A In this appendix, we present some necessary equations for computing the averages in Eqs. (31)–(39):
t
xi x,s =
ni s i =1
ωˆ tisi
ni
dxti q xti , si xti =
s i =1
ωˆ tisi μˆ tisi ,
(A.1)
⎧ ni nl
⎪ t t ⎪ ⎪ ˆ ˆ ω ω dxti dxlt q xti , si q xlt , sl xti xlt , (l = i ), ⎪ isi isl ⎪ ⎨ s =1 s =1
i
xti xlt x,s = ni ⎪ ⎪ ⎪
⎪ ⎪ ⎩
s i =1
l
ωˆ tisi
dxti q xti , si xt2 1 ,
(l = i ),
⎧ t t x x , (l = i ), ⎪ ⎪ ⎨ i x,s l x,s ni = t 1 ⎪ ωˆ isi μˆ t2 , (l = i ), ⎪ isi + t ⎩ νˆ isi s i =1
t t t t ˆ tisi , νˆ ist i xti = μ ˆ tisi , xi x|s = dxi q xi |si xi = dxti N xti ; μ ⎧
⎪ ˆ tisi μ ˆ tisl , (l = i ), dxti dxlt q xti |si q xlt |sl xti xlt = μ ⎪ ⎨ t t xi xl x|s = t t2 1 ⎪ t ⎪ ˆ t2 (l = i ), ⎩ dxi q xi |si xi = μ isi + t , νˆ isi
Λ = dΛ q(Λ)Λ = ξˆ ηˆ ,
log |Λ| =
dΛ q(Λ) log |Λ| =
N ψ (ξ + 1 − j )/2 + K log 2 + log |ηˆ |,
(A.3)
(A.4)
(A.5)
(A.6)
j =1
d A q( A ) A = aˆ ,
A =
(A.7)
⎧
K K ⎪ ⎪ ⎪ ⎪ C d A q( A ) A ji A kl , (l = i ), jk ⎪ ⎪ ⎨ j =1 k =1 T A C A il = K
K ⎪ ⎪ ⎪ 2 ⎪ C jk d A q( A ) A ji A ki + C j j d A ji q( A ) A ji , (l = i ), ⎪ ⎪ ⎩ j =1 k(= j ) ⎧ K K ⎪ ⎪ ⎪ ⎪ C jk aˆ ji aˆ kl , (l = i ), ⎪ ⎪ ⎨ j =1 k =1 = K K K ⎪ ⎪ ⎪ ⎪ ˆ ˆ C a a + C j j aˆ 2ji + 1/bˆ ji ji , (l = i ), ⎪ ji jk kl ⎪ ⎩ j =1 k(= j )
log ωisi = ψ(λˆ isi ) − ψ(λˆ i ), ˆi = where λ
(A.2)
ni
(A.8)
j =1
(A.9)
ˆ
si =1 λisi ,
νisi = αˆ isi βˆisi ,
(A.10)
log νisi = ψ(αˆ isi ) + log βˆisi ,
(A.11)
32
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
ˆ isi , νisi , μisi = αˆ isi βˆisi m 1 νisi μ2isi = αˆ isi βˆisi mˆ 2isi = . γˆisi
(A.12) (A.13)
The KL distances between the prior distribution and the posterior distribution over the model parameters in Eq. (46) can be defined as:
KL q( A ) p ( A ) =
1 2
log
− 1 | mat(bˆ )| 1
+ tr mat(b) mat(bˆ ) | mat(b)| 2
1
KL
2
2
K
+
KL q(Λ) p (Λ) = log
L
K
L
(ˆa ji − a ji )b jikl (ˆakl − akl ) −
j =1 i =1 k =1 l =1
zξ,η zξˆ ,ηˆ
+
ξˆ − ξ 2
log |Λ| +
1 2
tr
,
η−1 − ηˆ −1 Λ ,
ni L Γ (λisi ) Γ (λˆ i ) ˆ ˆ ˆ + log log , + (λisi − λisi ) ψ(λisi ) − ψ(λi ) KL q(ω) p (ω) = Γ (λi ) Γ (λˆ isi )
ˆi = where λ
ni
(A.14)
(A.15)
(A.16)
s i =1
i =1
si =1 λisi ,
KL q(μ, ν ) p (μ, ν ) =
ˆ βisi Γ (αisi ) βisi αisi − 1 + (αˆ isi − αisi )ψ(αˆ isi ) + log βisi Γ (αˆ isi ) βˆisi i =1 s i =1 γˆisi γis 1 ˆ isi − misi )2 + i − 1 . + log + γisi αˆ isi βˆisi (m 2 γisi γˆisi
ni L
αˆ isi
(A.17)
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
S. Choi, A. Cichocki, H.M. Park, S.Y. Lee, Blind source separation and independent component analysis: A review, Neural Inf. Process. 6 (1) (2005) 1–57. P. Comon, Independent component analysis: A new concept? Signal Process. 36 (1994) 287–314. A.J. Bell, T.J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Comput. 7 (1995) 1129–1159. A. Hyvarinen, Fast and robust fixed-point algorithms for independent component analysis, IEEE Trans. Neural Netw. 10 (3) (1999) 626–634. A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley, New York, 2001. Chu-Jun Yao, Hai-Lin Liu, Zhi-Tao Cui, Mixing matrix recovery of underdetermined source separation based on sparse representation, in: Computational Intelligence and Security, 2007, International Conference, pp. 1–5. A. Cichocki, P. Georgiev, Blind source separation algorithms with matrix constraints, IEICE Trans. Fundamentals E86-A (2003) 522–531. H. Attias, Independent factor analysis, Neural Comput. 11 (1999) 803–851. H. Attias, Inferring parameters and structure of latent variable models by variational Bayes, in: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 1999, pp. 21–30. H. Attias, A variational Bayesian framework for graphical models, in: S. Solla, T. Leen, K.-R. Muller (Eds.), Advances in Neural Information Processing Systems, MIT Press, 2000, pp. 209–215. W.D. Penny, J. Kilner, F. Blankenburg, Robust Bayesian general linear models, NeuroImage 36 (2007) 661–671. S. Roberts, R. Choudrey, Data decomposition using independent component analysis with prior constraints, Pattern Recogn. 36 (2003) 1813–1825. N. Nasios, A.G. Bors, Variational learning for Gaussian mixture models, IEEE Trans. Syst. Man Cybern. B 36 (2006) 849–862. J.T. Nelson, A. Eduardo, W.D. Penny, Baysian M/EEG source reconstruction with spatio-temporal priors, NeuroImage 39 (2008) 318–335. J. Kiebel, J. Daunizeau, C. Phillips, J. Friston, Variational Bayesian inversion of the equivalent current dipole model in EEG/MEG, NeuroImage 39 (2008) 728–741. M. Zumer, T. Attias, K. Sekihara, S. Nagarajan, Probabilistic algorithms for MEG/EEG source reconstruction using temporal basis functions learned from data, NeuroImage 41 (2008) 924–940. S. Nagarajan, T. Attias, E. Hild, K. Sekihara, A graphical model for estimating stimulus-evoked brain responses from magnetoencephalography data with large background brain activity, NeuroImage 30 (2006) 400–416. T. Minka, J. Lafferty, Expectation-propagation for the generative aspect model, in: Proc. Conf. Uncertainty Arti. Intell., Edmonton, AB, Canada, Aug. 1–4, 2002, pp. 352–359. N. Nasios, A.G. Bors, Blind source separation using variational expectation-maximization algorithm, in: Computer Analysis of Images and Patterns, Proceedings, in: Lect. Notes Comput. Sci., vol. 2756, 2003, pp. 442–450. S.J. Roberts, W.D. Penny, Variational Bayes for generalized autoregressive models, IEEE Trans. Signal Process. 50 (9) (2002) 2245–2257. Sam Roweis, Zoubin Ghahramani, A unifying review of linear Gaussian models, Neural Comput. 11 (2) (1999) 305–345. J. Miller, J. Thomas, Detectors for discrete-time signals in non-gaussian noise, IEEE Trans. Inf. Theory 18 (2) (1972) 241–250. T. Moon, The expectation-maximization algorithm, IEEE Signal Process. Mag. 13 (6) (1996) 47–60. D.L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inf. Theory 41 (3) (1995) 613–627. D.L. Donoho, I.M. Johnstone, Ideal denoising in an orthonormal basis chosen from a library of bases, Comp. Rend. Acad. Sci. Ser. A 319 (1994) 1317– 1322. D.L. Donoho, I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika 81 (1994) 425–455. D.B. Geselowitz, On the magnetic field generated outside an inhomogeneous volume conductor by internal current sources, IEEE Trans. Magn. MAG-6 (1970) 346–347.
S. Sun et al. / Digital Signal Processing 22 (2012) 17–33
33
[28] M. Hämäläinen, R. Hari, R.J. Ilmoniemi, J. Knuutila, O.V. Lounasmaa, Magnetoencephalography – theory, instrumentation, and applications to noninvasive studies of the working human brain, Rev. Mod. Phys. 65 (1993) 413–497. [29] J. Sarvas, Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem, Phys. Med. Biol. 32 (1) (1987) 11–22. [30] W.C. Davidon, Variable metric method for minimization, SIAM J. Optim. 1 (1) (1991) 1–17. [31] J. Nocedal, S.J. Wright, Numerical Optimization, second edition, Springer-Verlag, New York, 2006.
Shijun Sun received the B.S. degree in physics from the Southeast University for Nationalities, China, in 1986, the M.S. degree in theoretical physics from Chongqing University in 2000, and the Ph.D. degree in Biomedical Engineering from Chongqing University in 2010. His current research interests are medical signal processing, DNA sequence analysis, and business intelligence.
Chenglin Peng graduated from Chongqing University and Chendu Telecommunication College successively. He is a professor, doctoral and postdoctoral supervisor in Chong University. His current research interests are medical apparatus, telemedicine, and micro-system technology.
Wensheng Hou received the B.S. degree in physics from the Southwest Normal University in 1991, the M.S. and Ph.D. degrees in Biomedical Engineering from Chongqing University in 1998 and 2001 respectively. He is a professor at Chongqing University. He was a visiting research in the Clinical Neuroscience Reach Team of the University of Surrey, and the Dept. of Electric and Computer Engineering, University of Nevada, Las Vegas. His current research interests include neuroengineering and rehabilitation, biomedical signal and image processing, medical instrumentation.
Jun Zheng received the B.S. and M.S. degrees in Electrical Engineering from Chongqing University, China, in 1993 and 1996, respectively, the M.S.E. degree in Biomedical Engineering from Wright State University, Dayton, Ohio, in 2001, and the Ph.D. degree in Computer Engineering from the University of Nevada, Las Vegas in 2005. He was an assistant professor in the Department of Computer Science at Queens College, The City University of New York from 2005 to 2008. Currently he is an assistant professor in the Department of Computer Science at New Mexico Institute of Mining and Technology. His current research interests are wireless networks, digital signal and image processing, and computer architectures. Dr. Yingtao Jiang is an Associate Professor of the Department of Electrical and Computer Engineering in the Howard R. Hughes College of Engineering at the University of Nevada, Las Vegas. His research interests are VLSI architecture, circuit level design technique for DSP, networking, medical, telecommunication systems, computer architectures, nanotechnologies for biological and medical applications.
Xiaolin Zheng received the B.S., M.S., and Ph.D. degrees in biomedical engineering from Chongqing University, China, in 1982, 1987, and 1995 respectively. He is now a professor and doctoral supervisor in Chongqing University. His current research interests are biomedical signal detection and processing, bioMEMS system, biological magnetic stimulation technology, and medical image processing.