Variant functional approximations for latent Gaussian models (Technical Report of Statistics department, TCD, 10, January, 2011)
1
Ji Won Yoon, 1 Simon Wilson, 2 Koray Kayabol and 2 Ercan E. Kuruoglu 1 Statistics department, Dublin, Ireland 2 CNR, Pisa, Italy {yoonj, swilson}@tcd.ie,{koray.kayabol, ercan.kuruoglu}@isti.cnr.it Abstract
The integrated nested Laplace approximation (INLA) [Rue et al., 2009] is now a well-known functional approximation algorithm for implementing Bayesian inference in latent Gaussian models but has some limitations: it is unable to handle a high dimensional model parameter θ, and makes a poor approximation when the posterior is multi-modal and the likelihood is highly nonGaussian. Two types of algorithm are proposed to address these limitations: (a) a combination of INLA and Monte Carlo methods and (b) analytic approximations with higher order moments. We test the performance of algorithms on a nonlinear stochastic velocity model for exchange current rate data between Euros and Dollars and a factor analysis model with both synthetic datasets and real data from multi-spectral extra-terrestrial microwave maps.
1
Introduction
Latent Gaussian models, a subclass of structured additive regression models, have a wide range of application domains in statistics, signal processing and machine learning. Examples are regression with an additive mixed linear model [Fahrmeir and Lang, 1999; McCulloch et al., 2008] and random walk model [Rue and Held, 2005], dynamic linear models [West and Harrison, 1997], spatial and spatiotemporal models [Besag et al., 1991; Best et al., 2005; Brix and Diggle, 2001]. The integrated nested Laplace approximation (INLA) [Rue et al., 2009] is a fast and accurate functional approximation algorithm for implementing Bayesian inference with such models when compared to conventional Monte Carlo simulation. The contribution of this paper is to propose several variants of Laplace Approximation (LA) to overcome some of limitations of INLA: inability to deal with high dimensionality of the model parameters, posterior multi-modality and highly skewed likelihoods. This is illustrated through application to both a stochastic velocity model and a factor analysis model.
2
Statistical Background
The latent Gaussian model has the following structure for observations y in terms of ! latent variables x and hyperparameters θ: p(y | x, θ) = j p(yj | xj , θ) (so observations are conditionally independent), p(x | θ) is a Gaussian Markov random field (GMRF) with precision matrix Q, or at least a Gaussian with sparse precision matrix, and there is a some prior p(θ) on the model parameters. A sparse precision matrix as the prior of the latent Gaussian model speeds up computation, of which the most popular is through the Gaussian Markov random field (GMRF) model [Weir, 1997; Hanson and Wecksung, 1983; Hunt, 1977; Therrien, 1993].
2.1 Integrated Nested Laplace Approximation The integrated nested Laplace approximation (INLA) [Rue et al., 2009] approximates the marginal posterior p(x|y) by " p(x|y) = p(x|y, θ)p(θ|y)dθ " ≈ p˜(x|y, θ)˜ p(θ|y)dθ # ≈ p˜(x|y, θj )˜ p(θi |y)∆θj (1) θj
where
p˜(θ|y) =
$ p(x, y, θ) $$ . pG (x|y, θ) $x=x∗ (θ)
(2)
Here, pG (x|y, θ) denotes a Gaussian approximation and x∗ (θ) is its mode. The mode x∗ (θ) is typically calculated by a numerical optimization of the log posterior. A set of discrete values θj is defined over which p˜(θ|y) is computed by first finding a mode µ∗θ of log p˜(θ|y) by a quasiNewton optimization and computing the Hessian H∗θ at that mode. A grid search is then conducted from the mode in all directions until log p˜(µ∗θ |y) − log p˜(θ|y) > δπ where δπ is a given threshold. This gives a region over which a grid of θj values may be defined.
3 Proposed algorithms Problems with INLA occur if the dimension of θ is even modestly large e.g. much above 5 because the grid over θ space becomes too large. There are also problems if p(θ|y)
or p(x|y, θ) are multi-modal or severely skewed (i.e. far from Gaussian). In order to address the above problems, we propose two classes of approach. The first addresses a high dimension θ through a Monte Carlo approach to generate samples of θ but not x. The other approach is still a functional approximation, where higher order moments of the function are used. In the multi-modal case, the Gaussian approximation results in large error in the approximated distribution, i.e. given a reasonable threshold δ, |p(x∗ |y, θ) − pG (x∗ |y, θ)| > δ. Several variant LA algorithms are proposed in Algorithm 1.
3.1 IS-LA The first approach is a hybrid approach between importance sampling and LA, named IS-LA. It uses: " p(x|y) = p(x|y, θ)p(θ|y)dθ " ≈ p(x|y, θ)˜ p(θ|y)dθ " p˜(θ|y) ≈ p(x|y, θ) q(θ|y)dθ q(θ|y) # 1 ≈ % p(x|y, θi )wi , i wi i where θi ∼ q(θ|y) and its weights is defined as $ p(y,x,θi ) $ $ p(x|y,θ ) i p˜(θi |y) x=x∗ (θi ) wi = = . q(θi |y) q(θi |y)
The proposal function q(θi |y) can be defined in different ways. The prior distribution (IS-LA(1) ) The proposal distribution is the prior distribution: q(θ|y) = p(θ). In general importance sampling has a serious problem in obtaining incorrect weights in the tail area when the tail of the proposal distribution is too small. This problem is avoided since the$ terms are analytically cancelled off as $ wi = p(y|x,θ)p(x|θ) . In addition, using the prior as pG (x|y,θ) $ ∗ x=x (θ)
the proposal distribution does not require Newton-style optimization.
An approximation to the posterior distribution (IS-LA(2) ) Although the prior distribution has many benefits, it may not be close to the posterior. It may be better to use a proposal that depends on the data; ideally q(θ|y) = p(θ|y). However, it is rather difficult to obtain a reasonable p(θ|y) if p(y | x, θ) is either non-linear or non-Gaussian. A Laplace approximation can be used: q(θ|y) = pG (θ|y) via the quasi-Newton approach of Section 2.1. Although this approach works efficiently, it can suffer from incorrect weights when values are sampled in the tail.
3.2 MH-LA Another hybrid approach combines LA and the MetropolisHastings (MH) algorithm. Samples are generated from
p(θ|y) using the Laplace approximation as the MH proposal& distribution. ' The accept probability is A = " " " ˜ |y)q(θ|θ ) min 1, p(θ where θ and θ denote current and " p(θ|y)q(θ ˜ |θ) $ " $ " p(y,x,θ ) $ new samples respectively, and p˜(θ |y) = p(x|y,θ" ) $ . x=x∗ (θ " )
There are several possibilities for the proposal function, as below. A random walk One of the simplest proposal functions is a random walk " " " q(θ |θ) = N (θ ; θ, Σθ ) where θ and θ denote the previous and current samples respectively. In addition, Σθ can be chosen either manually or systematically. An approximation to the posterior distribution The proposal function used in IS-LA can be used in MHLA. For instance, the optimal proposal function is designed " " as q(θ |θ) = pG (θ |y), as by the quasi-Newton approach of Section 2.1.
3.3 Marginalisation based LA (MLA) Multi-modality and high skewness cannot be dealt with by these hybrid approaches since the proposal function itself depends on the Laplace approximation. A functional approximation based on the marginal likelihood with higher order moments can be used to address the problem. The marginal posterior of θ has the form: " p(θ|y) ∝ p(θ) exp {log p(y|x, θ) + log p(x|θ)} dx "x = p(θ) exp {R(x; θ)} dx x ( ) p(θ) ˜(x; θ) = exp f (3) 1/2 |Q∗x (θ)| where µ∗x (θ) and Q∗x (θ) are a mode and the negative Hessian matrix of R(x; θ). And the R(x; θ) is as following:
1 − (x − µ∗x (θ))T Q∗x (θ)(x − µ∗x (θ)) + f (x; θ) 2 1 ≈ − (x − µ∗x (θ))T Q∗x (θ)(x − µ∗x (θ)) + f˜(x; θ). 2 Here, it is not straightforward to obtain the approximation f˜(x; θ) so that we propose an approximation to the mean estimate of Eπ(x) (f (x)): * f˜(x; θ) = Eπ(x) (f (x; θ)) = f (x; θ)π(x)dx + % = Z1 x(i) ∈S exp(R(x(i) ; θ)) R(x(i) ; θ) , + 12 (x(i) − µ∗x (θ))T Q∗x (θ)(x(i) − µ∗x (θ)) (4) % (i) where Z = x(i) ∈S exp(R(x ; θ)) and there are K elements in S. As K becomes extremely larger, a more accurate estimate of the distribution is obtained in a manner of Monte carlo method. However, in this paper, we use only small number of points (K ≤ 5) for light computation as INLA does. For example, we use K = 5 and the important points are obtained by the following set: S = {x : µ∗x (θ) + jVΛ1/2 1, j = −2, −1, 0, 1, 2} where Q∗x (θ) = (VΛVT )−1 and 1 is a R(x; θ)
=
vector with ones for simplicity. V and Λ are obtained by eigen-decomposition. Note that if we use K = 1 the proposed MLA becomes exactly the conventional INLA since f˜(x; θ) = R(µ∗x (θ); θ). In this point of view, MLA is a superposition of INLA with varying K. The mode µ∗θ and its negative Hessian matrix H∗θ of p˜(θ|y) with respect to θ are found by a quasi-Newton method. A greedy search is used in the same manner as conventional INLA to obtain a discrete grid of points θj over which p˜(θ|y) is evaluated. Of course, if the model is linear and Gaussian, p(y|θ) is in closed form and Eq. (2) is unnecessary.
3.4 Multiple initial points An alternative approach to the multiple mode problem is to repeat the search for modes of p(x | y, θ) from different initial points. Each search yields a mode and Hessian. Of course there is no guarantee that all modes will be discovered, even if the number of them is known.
4 Stochastic velocity model
ft |f1:t−1 , τ, φ
∼ N (0, exp(ηt )) = h + ft 1 # 1 ∼ N (µi , 1/τ ) 2 i=0
x=x (θ)
4: 5: 6: 7:
8: 9:
The financial time series are often modelled by stochastic velocity models that are highly nonlinear and non-Gaussian. We tested our proposed algorithm on two different datasets: one from artificial data plotted in Fig. 1-(a) and the other from Euro-dollar exchange currency rate in Fig. 1-(b). The artificial dataset were drawn from the Eq. (5) and the real exchange currency rate dataset is obtained between 1st September 2010 and 20th January 2011. In this manuscript, we set the ground truth for the synthetic observations as h = 0, φ = 0.4, and τ = 2.2. yt |ηt ηt
Algorithm 1 Variant Laplace Approximations 1: Choose an approach, type∈ {INLA, IS-LA, MH-LA, MLA}. Obtain a mode and its negative hessian matrix by a quasi newton approach for p˜(θ|y). 2: if type is not MLA then & ' $ $ 3: (µ∗θ , H∗θ ) = arg maxθ log pp(y,x,θ) $ G (x|y,θ) ∗
(5)
where µi = φ1−i (1 − φ)i ft−1 + 3i for a mixture model, x1:T = (η1:T , µ) and θ = (φ, τ ). We used the following priors: h ∼ N (0, 1), φ ∼ B(2, 2) and τ ∼ G(1, 0.1) where N , B and G denote the normal, beta and gamma distributions respectively. Two top sub-graphs of Fig. 2 demonstrate the reconstructed distributions of p˜(θ|y) of the synthetic dataset via INLA and MLA. As we can see in the figures, the approximated distribution via MLA is rather different from that via INLA since the p(x|y, θ) has multimodality and skewness due to the mixture model. In order to monitor the performance at the unimodal case, the mixture model of Eq. (5) is replaced to a unimodal model for the real dataset, i.e., µi = φft−1 for all is. The results of the real exchange currency rates are also processed in Fig. 2-(c, d). It founds that MLA and INLA have the identical results from this figure given the unimodal model.
5 Application to multi-spectral image source separation In the multi-spectral source separation problem, the data consists of nf images of J pixels, that usually correspond to
10: 11: 12: 13: 14:
where x∗ (θ) be a mode of pG (x|y, θ). else Calculate µ∗x (θ) and Q∗x (θ) are a mode and Hessian matrix of log p(y|x, θ)(x|θ). Obtain V and Λ by eigendecomposition of Q∗x (θ). Build a set S with K elements by using µ∗x (θ) and Q∗x (θ), i.e. x(i) = µ∗x (θ) + jVΛ1/2 1 for j = K−1 − K−1 2 , · · · , 0, · · · , 2- . ( ). (µ∗ , H∗ ) = arg maxθ log p(θ) 1/2 exp f˜(x; θ) θ
θ
|Q∗ x (θ)|
where f˜(x; θ) = Eπ(x) f ((x; θ)) with S from Eq. (4). end if Obtain θi s given the following strategies: if type is INLA then After finding (µ∗θ , H∗θ ), do a grid search from the mode in all directions until the log p˜(µ∗θ |y)−log p˜(θ|y) > δπ where δπ is a given threshold. else if IS-LA then Draw samples from the optimal proposal function, q(θ|y) = p(θ|µ∗θ , H∗−1 ). θ Calculate the weights by $ p(y,x,θ (i) ) $ (i) (i) pG (x|y,θ ) $x=x∗ (θ (i) ) p˜(θ |y) wi = = . q(θ(i) |y) p(θ(i) |µ∗θ , H∗−1 ) θ
15: else if MH-LA then 16: Draw samples from the proposal function as / ∗ such 0 op" ∗−1 17:
timal proposal function q(θ |θ) = p θ|µθ , Hθ Calculate the acceptance ratio by A 1 2 " " p(θ ˜ |y)q(θ;θ ) min 1, p(θ|y)q(θ " ˜ ;θ)
.
=
18: else if MLA then 19: After finding (µ∗θ , H∗θ ), do a grid search from the mode
in all directions until the log p˜(µ∗θ |y)−log p˜(θ|y) > δπ where δπ is a given threshold. 20: end if % 21: Estimate p(x|y) = θi p(x|y, θi )˜ p(θi |y)∆θi .
images at different frequencies v1 , · · · , vnf . The data at pixel j are denoted yj ∈ 'nf , j = 1, 2, · · · , J, while Yk = (y1k , · · · , yJk )T denotes the image at frequency vk . The observed images are believed to be built up of linear combinations of ns sources, represented by the vectors xj ∈ 'ns . We assume that the yj follow a standard statistical independent components analysis model, so that they can be represented as a linear combination of the xj such that yj = Axj + ej where A is an nf × ns mixing matrix and ej is a vector of nf independent Gaussian error
20
15
10
Rates (observation)
5
0
−5
−10
−15
−20
0
5
10
15
20
25 Time
30
35
40
45
50
(a) Synthetic data 0.36 0.34
Rates (observation)
0.32 0.3 0.28 0.26 0.24 0.22
10
20
30
40
50 Time
60
70
80
90
(b) Real exchange rate data (Euro and Dollar)
5.1 Simulated data example
The first example is a set of 9 images of size 16 × 16 constructed from 3 synthetic sources. 30 GHz
44 GHz 2
2
4
4
4
6
6
6
8
8
8
10
10
10
0.8
12
12
12
0.7
0.7
14
14
14
16
16
16
0.6
0.6
0.5
0.5
0.4 0.3
5
10
15
0.2
0.4
0.6
0.8
0.5
1
1.5 τ
(a) φ
2
2.5
3
(b) τ
5
6
8
8
10
10
12
12
12
14
14
14
16 5
10
15
15
4
6
8 10
16
10
217 GHz 2
4
6
0 0
1
φ
15
2
4
0.1
0 0
10
143 GHz
2
0.2
0.1
5
100 GHz
0.4 0.3
0.2
S
70 GHz
2
0.8
p(τ| y)
p(φ| y)
Figure 1: Two observations in Stochastic Velocity Model for synthetic data (a) and real exchange rate data between Euros and Dollars from 1st September 2010 to 20th January 2011 (b)
inal INLA or MLA are the best choices in this simple linear model. It is also noted that, in this model, the likelihood is p(y|x, θ) = N (Y; Bx, C−1 ). Further, the conditional posterior p(x|y, θ) is also exactly Gaussian with precision Q∗ = Q + BT CB and mean µ∗ = Q∗−1 BT CY. Hence the marginalized likelihood is $ $1/2 1 2 $C$ , |Q|1/2 1 + ∗T ∗ ∗ T p(y|θ) = $$ $$ exp µ Q µ − y Cy . 2π 2 |Q∗ |1/2
16 5
10
15
5
10
15
−3
0.045
6
x 10
0.04
0.03
4
0.025
p(τ| y)
p(φ| y)
Figure 3: Observed signals for the synthetic data
5
0.035
0.02 0.015
3
2
0.01
R
1
0.005 0 0
0.2
0.4
0.6 φ
(c) φ
0.8
1
0 0
1
2
3
4 τ
5
6
7
8
(d) τ
Figure 2: Model parameters in Stochastic Velocity Model for synthetic data (S) and real exchange rate data (R): INLA (blue solid line) and MLA (red dot line) terms with precisions τ = {τ1 , · · · , τnf }. We also define Xi = {xji |j = 1, · · · , J} to be the image of the ith source. Stacking the Yk and Xi as y = (Y1T , · · · , YnTf )T and x = (XT1 , · · · , XTns )T , and stacking the error terms by frequency E, we have y = Bx + E where B = A ⊗ IJ×J is the Kronecker product of A with the J × J identity matrix. The vector of error terms E is zero-mean Gaussian with precision matrix C = diag(τ1 1J , τ2 1J , . . . , τnf 1J ), where 1J is a vector of ones of length J. A so-called intrinsic GMRF prior is used for each source Xi , defined as follows. Let ∆ji be differenced values of Xi at pixel j. A second order random walk can be defined by % letting ∆ji = j " ∈ne(j) Xj " i − |ne(j)|Xji be independent zero-mean Gaussians with precision φi , for i = 1, . . . , ns where | · | represents the cardinality of a particular set and ne(j) is the set of neighbouring pixels of j. It can be shown that the resulting( distribution of) Xi is of Gaussian form: p(Xi |θ) ∝ exp − φ2i Xi T QXi where Qf = DT D and D is a J × J Toeplitz matrix. The stacked set of latent vari.T ables x = X1 T , X2 T , . . . , Xns T is zero-mean Gaussian with precision Q = diag(φ1 Qf , · · · , φns Qf ). However the precision matrix is not of full rank; this is the intrinsic GMRF and is widely used as a prior distribution in Bayesian latent models; see [Rue and Held, 2005]. This is a model of the form where INLA can be applied, with the latent variables x having precision matrix Q. Orig-
The extracted sources obtained by the different algorithms, and other common separation approaches, are plotted in Fig. 4. Except for fastICA, the ground truth was well recovered in most approaches. Tables 1 and 2 demonstrate the performance based on a measure of goodness of fit Peak Signalto-Inference Ratio (PSIR) and running time respectively. It can be seen that the proposed variant LAs have similar performance to MCMC in accuracy while they are considerably faster. In addition, we can check the linearity of the model by checking the similarity between results from INLA and MLA, which should give an identical result in that case; it is seen that that happens for our linear model example. LS FastICA SMICA MCMC INLA(1) INLA(2) IS-LA(1) IS-LA(2) MH-LA MLA
CMB 28.13± 6.26 10.95±18.4 16.33±130 41.47±18.0 40.87±16.3 40.93±17.2 41.19±16.4 41.11±15.1 41.13± 16.1 41.77± 12.3
Synchrotron 45.10±6.66 4.780±95.3 15.11±136 45.17±7.51 44.90± 6.25 44.90±6.23 44.94±6.29 44.93±6.37 44.92± 6.27 45.05±6.49
Dust 18.28±6.19 NaN ± NaN 18.05± 39.7 27.80±11.3 27.90± 11.3 27.89±11.2 27.78± 11.7 27.90± 11.3 27.89±11.2 27.83±11.2
Table 1: PSIR comparison for the synthetic data: INLA(1) and INLA(2) used GF and GMRF priors respectively. Also, LS is acronym of (General) Least Square.
5.2 Example of separation of the cosmic microwave background from all-sky microwave maps The discovery of the cosmic microwave background (CMB) is strong evidence for the big bang theory of the formation
GT
fastICA
LS
cmb
MCMC
cmb
INLA(1)
cmb
INLA(2)
cmb
IS-LA(2)
cmb
MH-LA
cmb
MLA
cmb
cmb
2 2
2
2
2
2
2
2
2
4
4
4
4
4
4
4
4
6
6
6
6
6
6
6
6
8
8
8
8
8
8
8
8
4
6
8
10
(a)
10
12
10
12
14
4
6
8
10
12
14
16
4
6
8
10
12
14
16
4
6
8
10
12
14
16
4
6
8
10
12
14
16
4
6
8
10
12
14
16
4
6
8
10
12
14
16
12
14
16 2
synchrotron
10
12
14
16 2
synchrotron
10
12
14
16 2
synchrotron
10
12
14
16 2
synchrotron
10
12
14
16 2
10
12
14
16
2
10
12
14
16
14
16 2
4
6
synchrotron
8
10
12
14
16
16 2
4
6
synchrotron
8
10
12
14
16
2
4
6
synchrotron
8
10
12
14
16
10
12
14
16
10
12
14
16
synchrotron
2 2
2
2
2
2
2
2
2
4
4
4
4
4
4
4
4
4
6 6
8
6
8
10
(b)
4
6
8
10
12
14
16
4
6
8
10
12
14
16
6
4
6
8
dust
10
12
14
16
4
6
8
dust
10
12
14
16
6
8
dust
10
12
14
16
4
6
8
dust
10
12
14
16
8
10
12
14
16 2
6
8
12
14
16 4
10
12
14
2
6
8
10
12
16 2
6
8
10
14
16 2
6
8
12
14
16 2
10
12
14
16
2
10
12
14
16
8
10
12
14
6
8
10
12
14
16 2
4
6
8
dust
10
12
14
16
16 2
4
6
8
dust
10
12
14
16
2
4
6
8
dust
dust
2 2
2
2
2
2
2
2
2
4
4
4
4
4
4
4
4
4
6 6
8
10
(c)
12
14
16 2
4
6
8
10
12
14
16
6
6
6
6
6
6
6
8
8
8
8
8
8
8
8
10
10
10
10
10
10
10
10
12
12
12
12
12
12
12
12
14
14
14
14
14
14
14
14
16
16
16
16
16
16
16
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
2
4
6
8
Figure 4: Comparison of restored three signals for the synthetic data: GT (Ground Truth) and LS (Least Square) LS 0.00785 ±0.0315
fastICA 9.8226 ±18.506
SMICA 0.6961 0.1081
MCMC 1729.2 ±140.42
INLA(1) 2.3283 ±0.7104
INLA(1) 3.1743 ±0.6661
IS-LA(2) 23.824 ±3.4309
MH-LA 26.836 ±5.25
MLA 14.634 ±3.5447
Table 2: Run time comparison for the synthetic example and development of the universe. According to the theory, the early universe was smaller and hotter but cooled as it expanded. Once the temperature cooled to about 3000K, photons were free to propagate without being scattered off ionized matter; the CMB is an image of this event and is visible across the entire sky. Three satellites have been launched to measure the CMB: the cosmic background explorer (COBE), Wilkinson microwave anisotropy probe (WMAP) and most recently the Planck surveyor. Planck is the highest resolution data to date, of the order of 107 pixels across the sky measured at 9 channels. Unfortunately, the signals measured by these satellites as in Fig. 5 contain radiation not only from CMB but also contributions from a number of other sources, namely foreground radiations and extragalactic sources in addition to antenna receiver noise. Foreground sources from our galaxy include synchrotron, dust and free-free emission. Therefore, the separation of the CMB signal from other sources is an important stage in the production of CMB maps. [Kuruoglu, 2010].
(a) Ka
(b) K
(d) V
(e) W
(c) Q
Figure 5: Observed WMAP 7 year data To date, there have been several attempts to achieve it in a Bayesian framework: Gaussian Mixture prior [Wilson et al., 2008], t-distribution prior [Kayabol et al., 2009a] and Markov Random Field (MRF) prior [Kayabol et al., 2009b]. Some of these are fully Bayesian source separation methods which are developed to separate the underlying CMB from the mixed observed signals of extraterrestrial microwaves made at sev-
eral frequencies. Mixing Matrix Structure In this application, A is parameterized and denoted A(θ). Each column of A(θ) is the contribution to the observation of a source at different frequencies, which is written as a function of the frequencies and θ. These parameterizations are approximations that come from the current state of knowledge about how the sources are generated. Here, we merely state the parameterization that we are going to use, and refer to [Eriksen et al., 2007] for a more detailed exposition on the background to them. It is assumed that the CMB is the first source and therefore, it corresponds to the first column of A(θ). It is modelled as a black body at a temperature, and its contribution is a known constant at each frequency. The parametrization of the mixing matrix is given as g(vi ) , g(v ) & 1' κ vi s Ai2 (θ) = and v1 & '1+κd exp(ηv1 /kB T1 ) − 1 vi Ai3 (θ) = exp(ηvi /kB T1 ) − 1 v1 3 42 exp(ηvi /kB T0 ) i where g(vi ) = kηv (exp(ηvi /kB T0 )−1)2 , T0 = 2.725 is B T0 the average CMB temperature in Kelvin, T1 = 18.1, η is the Plank constant and kB is Boltzmann’s constant. The ratio g(vi )/g(v4 ) is designed to ensure that A41 (θ) = 1 as we constraint the fourth row of A(θ) to be ones. There are two unknown model parameters for A, for synchrotron κs ∈ {κs : −3.0 ≤ κs ≤ −2.3} and the spectral indeces for dust κd ∈ {κd : 1 ≤ κd ≤ 2}. Ai1 (θ)
=
Prior There are two types of hidden variables: sources x and model parameters θ, which consists of κs , κd and a precision parameter φi for the GMRF prior of each The prior for !nsource. s θ is designed as p(τ1:ns , κs , κd ) = [ i=1 p(τi )] p(κs )p(κd ) where p(τi ) = G(τi ; αi , βi ), p(κs = U(κs ; ακs , βκs ) and p(κd ) = U(κd ; ακd , βκd ). Let G and U denote the Gamma
distribution and the uniform distribution and α and β are hyper-parameters which are fixed in this paper: we assign this hyper parameter to generate flat prior. Results We analysed the seven year WMAP 7 year data by using the MLA. WMAP data consist of 5 images of J = 3 × 220 = 3, 145, 728 pixels (see Fig. 5) which were divided into 6144 blocks of 512 pixels for the analysis (we used blocking technique for WMAP data). Assuming 3 sources, the processed images via MLA are as shown in Fig. 6.
(a) CMB
(b) Syncrotron
(c) Dust
Figure 6: Estimated Sources of WMAP by MLA
6
Conclusion
The well-known integrated nested Laplace approximation (INLA) has several limitations. In this paper, several variants of Laplace Approximation (LA) are proposed to tackle such serious limitation of the conventional INLA. In order to solve the high dimensionality over the model parameter space, Monte Carlo (MC) simulation are hybridized with LA. This approach still practically fast and efficient since MC draws samples only from model parameter space. The other approach based on approximated maginalized likelihood provides the potential solution for the multi-modality and skewness problem.
References [Besag et al., 1991] J. Besag, J. York, and A. Mollie. Bayesian image restoration with two applications in spatial statistics. Annuals of the Institute of Statistical Mathematics, 43:1–59, 1991. [Best et al., 2005] N. Best, S. Richardson, and A. Thomson. A comparison of bayesian spatial models for disease mapping. Statistical Methods in Medical Research, 14:35–59, 2005. [Brix and Diggle, 2001] A. Brix and P. J. Diggle. Spatiotemporal prediction for log-gaussian cox processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(4):823–841, 2001. [Eriksen et al., 2007] H. K. Eriksen, C. Dickinson, C. R. Lawrence, and C. Baccigalupi. Cosmic microwave background component separation by parameter estimation. The Astrophysical Journal, 2007. [Fahrmeir and Lang, 1999] L. Fahrmeir and S. Lang. Bayesian inference for generalized additive mixed models based on markov random field priors. Journal of the Royal Statistical Society. Series C (Applied Statistics), 134, 1999.
[Hanson and Wecksung, 1983] K. M. Hanson and G. W. Wecksung. Bayesian approach to limited-angle reconstruction in computed tomography. Journal of the Optical Society of America, 73:1501–1509, November 1983. [Hunt, 1977] B. R. Hunt. Bayesian methods in nonlinear digital image restoration. IEEE Transactions on Computers, 26(3):219–229, March 1977. [Kayabol et al., 2009a] K. Kayabol, E. E. Kuruoglu, B. Sankur J. L. Sanz, E. Salerno, and D. Herranz. Adaptive langevin sampler for separation of t-distribution modelled astrophysical maps. CNR Technical report, September 2009. [Kayabol et al., 2009b] K. Kayabol, E. E. Kuruoglu, and B. Sankur. Bayesian separation of images modelled with mrfs using mcmc. IEEE Transactions on Image Processing, 18(5):982–994, May 2009. [Kuruoglu, 2010] E. E. Kuruoglu. Bayesian source separation for cosmology. IEEE Signal Processing Magazine, pages 43–54, January 2010. [McCulloch et al., 2008] C. E. McCulloch, S. R. Searle, and J. M. Neuhaus. Generalized, Linear, and Mixed Models, Second Edition. Wiley Series in Probabilisty and statistics, 2008. [Rue and Held, 2005] H. Rue and L. Held. Gaussian Markov Random Field: Theory and Applications. Chapman & Hall/CRC, 2005. [Rue et al., 2009] H. Rue, S. Martino, and N.s Chopin. Approximate Bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal Of The Royal Statistical Society Series B, 71(2):319– 392, 2009. [Therrien, 1993] C. Therrien. An approximate method of evaluating the joint likelihood for first-order gmrfs. IEEE Transactions on Image Processing, 2(4):520–523, October 1993. [Weir, 1997] I. S. Weir. Fully Bayesian Reconstruction from Single-Photon Emission Computed Tomography Data. Journal of the Ameerican Statistical Association, 92(437):49–60, March 1997. [West and Harrison, 1997] M. West and J. Harrison. Bayesian forecasting and dynamic models (2nd ed.). Springer-Verlag New York, Inc., New York, NY, USA, 1997. [Wilson et al., 2008] S. P. Wilson, E. E. Kuruoglu, and E. Salerno. Fully Bayesian source separation of astrophysical images modelled by mixture of Gaussians. IEEE Journal of selected topics in signal processing, 2(5):685–696, October 2008.