Fast unsupervised Bayesian image segmentation ... - Semantic Scholar

Report 2 Downloads 157 Views
1

Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation

arXiv:1502.01400v1 [stat.CO] 5 Feb 2015

Marcelo Pereyra and Steve McLaughlin

Abstract This paper presents two new Bayesian estimation techniques for hidden Potts-Markov random fields, with application to fast K-class image segmentation. The techniques are derived by first conducting a small-variance-asymptotic (SVA) analysis of an augmented Bayesian model in which the spatial regularisation and the integer-constrained terms of the Potts model are decoupled. The evaluation of this SVA Bayesian estimator is then relaxed into a partially convex problem that can be computed efficiently by iteratively solving a total-variation denoising problem and a leastsquares clustering (K-means) problem, both of which can be solved straightforwardly, even in high-dimensions, and with parallel computing techniques. This leads to a fast semi-supervised Bayesian image segmentation methodology that can be easily applied in large 2D and 3D scenarios or in applications requiring low computing times. Following on from this, we extend the proposed Bayesian model and inference technique to the case where the value of the spatial regularisation parameter is unknown and removed from the model by marginalisation. This leads to a new fully unsupervised fast Bayesian segmentation algorithm in which the strength of the spatial regularisation is adapted automatically to the observed image during the inference procedure. Experimental results on synthetic and real images, as well as extensive comparisons with state-of-the-art algorithms, confirm that the proposed semi-supervised and unsupervised methodologies offer extremely fast This work was funded in part by a Marie Curie Intra-European Fellowship for Career Development, in part by the SuSTaIN program - EPSRC grant EP/D063485/1 - at the Department of Mathematics, University of Bristol, in part by a postdoctoral fellowship from French Ministry of Defence, and in part by the EPSRC via grant EP/J015180/1.This paper was presented in part at EUSIPCO’14 in Lisbon September 2014. Marcelo Pereyra is with the University of Bristol, School of Mathematics, University Walk, BS8 1TW, UK (e-mail: [email protected]). Steve McLaughlin is with Heriot Watt University, Engineering and Physical Sciences, Edinburgh, EH14 4AS, UK (e-mail: [email protected]).

February 6, 2015

DRAFT

2

convergence and produce accurate segmentation results, with the important additional advantage of self-adjusting regularisation parameters.

Index Terms Image segmentation, Bayesian methods, spatial mixture models, Potts Markov random field, convex optimisation.

I. I NTRODUCTION Image segmentation is a canonical inverse problem which involves classifying image pixels into clusters that are spatially coherent and have well defined boundaries. It is widely accepted that this task can be formulated as a statistical inference problem and most stateof-the-art image segmentation methods compute solutions by performing statistical inference (e.g., computing penalized maximum likelihood or maximum-a-posteriori estimates). In this paper we propose a small-variance asymptotics estimator for hidden Potts-Markov random fields (MRFs), a powerful class of statistical models that is widely used in Bayesian image segmentation methods (see [1]–[3] for examples in hyperspectral, ultrasound and fMRI imaging). Despite the wide range of applications, performing inference on hidden Potts MRFs remains a computationally challenging problem and thus most image processing methods compute approximate estimators obtained via convex relaxations, local optimisation, Monte Carlo estimation, etc. Small-variance asymptotics estimators were introduced in [4] as a computationally efficient framework for performing inference in Dirichlet process mixture models and have been recently applied to other important machine learning classification models such as the Beta process and sequential hidden Markov models [5], as well as to the problem of configuration alignment and matching [6]. Here we exploit these same techniques for the hidden Potts MRF to develop two fast converging image segmentation methodologies that deliver accurate segmentation results in very few iterations. We first present a semi-supervised image segmentation method for the case of unknown class statistical parameters (e.g., class means) and fixed spatial regularisation parameter. The second method is an extension to the unsupervised case where both the class parameters and the regularisation parameter are unknown. The paper is organised as follows: in Section II we present a brief background to Bayesian image segmentation using the Potts MRF. This then followed by a description of our proposed February 6, 2015

DRAFT

3

semi-supervised methodology, and its extension to the unsupervised case. In Section IV the two methodologies are applied to some standard example images and compared to other image segmentation approaches from the state of the art. Finally some brief conclusions are drawn in Section V. We emphasise at this point that there are several other approaches to fast image segmentation; however, to the best of our knowledge there is no approach that adapts automatically the amount of spatial regularisation to the observed image (though this can be achieved by using more computationally intensive strategies such as Monte Carlo approximations [7], [8], variational Bayes approximations [9], and mean-field like approximations [10], [11]). In particular, several papers have considered convex models and convex approximations that can be solved efficiently by convex optimisation [12]–[16], approximate estimators computed with graph-cut and message passing algorithms [17]–[21], and fast steepest descent methods for active contour models [22]–[25]. In Section IV we present comparisons with many of these state-of-the-art approaches and discuss their connection to our work. II. BACKGROUND We begin by recalling the standard Bayesian model used in image segmentation problems, which is based on a finite mixture model and a hidden Potts-Markov random field. For simplicity we focus on univariate Gaussian mixture models. However, the results presented hereafter can be generalised to all exponential-family mixture models (e.g., mixtures of multivariate Gaussian, Rayleigh, Poisson, Gamma, Binomial, etc.) by following the approach described in [26]. Let yn ∈ R denote the nth observation (i.e. pixel or voxel) in a lexicographical vectorized image y = (y1 , . . . , yN )T ∈ RN . We assume that y is made up by K regions {C1 , . . . , CK } such that the observations in the kth class are distributed according to the following conditional marginal observation model yn |n ∈ Ck ∼ N (µk , σ 2 ),

(1)

where µk ∈ R represents the mean intensity of class Ck . For identifiability we assume that µk 6= µj for all k 6= j. To perform segmentation, a label vector z = (z1 , . . . , zN )T is introduced to map or classify observations y to classes C1 , . . . , CK (i.e., zn = k if and only if n ∈ Ck ). Assuming that observations are conditionally independent given z and given the parameter vector µ = February 6, 2015

DRAFT

4

(µ1 , . . . , µK ), the likelihood of y can be expressed as follows f (y|z, µ) =

K Y Y

pN (yn |µk , σ 2 ),

(2)

k=1 n∈Sk

with Sk = {n : zn = k}. A Bayesian model for image segmentation is then defined by specifying the prior distribution of the unknown parameter vector (z, µ). The prior for z is the homogenous K-state Potts MRF [27] f (z|β) =

1 exp [βH(z)], C(β)

(3)

with regularisation hyper-parameter β ∈ R+ , Hamiltonian H(z) =

N X X

δ(zn == zn0 ),

(4)

n=1 n0 ∈V(n)

where V(n) is the index set of the neighbors of the nth voxel and δ(·) is the Kronecker function, and normalising constant C(β) =

X

exp [βH(z)].

(5)

z

Bidimensional MRFs are commonly considered as prior distribution for z for 2D images and tridimensional MRFs for 3D images (the most commonly used 1st order 2D and 3D neighbourhoods V(n) are depicted in Fig. 1). Notice that the Potts prior f (z|β) is defined conditionally to a known value for β, which is assumed to be specified a priori by the practitioner. This assumption is relaxed in Section III-B by using a Bayesian technique to adjust the value of β automatically. To distinguish clearly between the supervised and unsupervised approaches here we denote the dependence on β explicitly. In a Similar fashion, the class means are considered prior independent and assigned Gaussian priors µk ∼ N (0, ρ2 ) with fixed variance ρ2 , f (µ) =

K Y

pN (µk |0, ρ2 ).

(6)

k=1

(to simplify notation the dependence of distributions on the fixed quantity ρ2 is omitted). Then, using Bayes theorem and taking into account the conditional independence structure of the model (see Fig. 2), the joint posterior distribution of (z, µ) given y and β can be expressed as follows f (z, µ|y, β) ∝ f (y|z, µ)f (z|β)f (µ),

February 6, 2015

(7)

DRAFT

5

Fig. 1. 4-pixel (left) and 6-voxel (right) neighborhood structures. The pixel/voxels considered appears as a void red circle whereas its neighbors are depicted in full black and blue.

where ∝ denotes proportionality up to a normalising constant that can be retrieved by setting R f (z, µ|y, β) dzdµ = 1. The graphical structure of this Bayesian model is summarised in Fig. 2 below. Notice the Markovian structure of z and that observations yn are conditionally independent given the model parameters z, µ and σ 2 . β ρ2

β 

z: σ2



µ





 

y

zn+1 

zn0

z



zn zn0 +1 

y:



yn 

yn0

yn 

yn0 +1

Fig. 2. [Left:] Directed acyclic graph of the standard Bayesian model for image segmentation (parameters with fixed values are represented using black boxes). [Right] Local hierarchical representation of the hidden Potts MRF and the observed image for 4 neighbouring pixels.

Finally, given the Bayesian model (7), a segmentation of y is typically obtained by computing the maximum-a-posteriori (MAP) estimator ˆ1, µ ˆ 1 = argmax f (z, µ|y, β) , z

(8)

z,µ

which can also be obtained by solving the equivalent optimisation problem ˆ1, µ ˆ 1 = argmin − log f (z, µ|y, β) . z

(9)

z,µ

Unfortunately these optimisation problems are known to be NP-hard due to the combinatorial nature of the Potts Hamiltonian H(z) defined in (4). As mentioned previously, modern February 6, 2015

DRAFT

6

image segmentation methods based on (7) typically address this issue by using approximate (local) integer optimisation algorithms (e.g., graph-cut, message passing) [18]–[20], and more recently with convex relaxations of the Potts model (see for instance [13], [14]). III. P ROPOSED METHOD A. Small-variance and convex approximations This section presents a new approach for performing inference on z based on a smallvariance asymptotics analysis combined with a convex relaxation of the Potts MRF. This approximation will lead to a new estimator of z that can be computed very efficiently in largescale scenarios by iteratively solving a convex total-variation denoising problem and a leastsquared clustering problem, which we solve straightforwardly with Chambolle’s optimisation algorithm [28] and with K-means [29]. We begin by introducing a carefully selected auxiliary vector x such that y and (z, µ) are conditionally independent given x, and that the posterior f (x, z, µ|y) has the same maximisers as (7) (after projection on the space of (z, µ)). More precisely, we define a random vector x ∈ RN with degenerate prior f (x|z, µ) =

K Y Y

δ(xn − µk ),

(10)

k=1 n∈Sk

and express the likelihood of y given x, z and µ as f (y|x, z, µ) = f (y|x) =

N Y

pN (yn |xn , σ 2 ).

n=1

The prior distributions for z and µ remain as defined in (3) and (6). The posterior distribution of x, z, µ is given by f (x, z, µ|y, β) ∝ f (y|x)f (x|z, µ)f (z|β)f (µ) ∝

K Y Y

! pN (yn |xn , σ 2 )δ(xn − µk )

(11)

k=1 n∈Sk

× exp [βH(z)]f (µ). Notice that from an inferential viewpoint (11) is equivalent to (7), in the sense that marginalising x in (11) results in (7). Moreover, we define H ∗ (z) as the “complement” of the Hamiltonian H(z) in the sense that for any z ∈ [1, . . . , K]N H(z) + H ∗ (z) = N |V|, February 6, 2015

DRAFT

7

where |V| denotes the cardinality of the neighbourhood structure V. For the Potts MRF this complement is given by ∗

H (z) ,

N X X n=1

δ(zn 6= zn0 ).

(12)

n0 ∈V(n)

Replacing H(z) = N |V|−H ∗ (z) in (11) we obtain f (x, z, µ|y, β) ∝

K Y Y

! pN (yn |xn , σ 2 )δ(xn − µk ) (13)

k=1 n∈Sk

× exp [−βH ∗ (z)]f (µ). Notice that (13) produces the same MAP segmentation of y as (9) given that ˆ 2, z ˆ2, µ ˆ 2 = argmin − log f (x, z, µ|y) , x x,z,µ

ˆ2 = z ˆ 1 and µ ˆ2 = µ ˆ 1 , and that x ˆ 2 is perfectly determined by z ˆ2, µ ˆ 2 through (10). verify z Furthermore, noting that H ∗ (z) only measures if neighbour labels are identical or not, regardless of their values, it is easy to check that the posterior (13) remains unchanged if we substitute H ∗ (z) with H ∗ (x) K Y Y

f (x, z, µ|y) ∝

! pN (yn |xn , σ 2 )δ(xn − µk ) (14)

k=1 n∈Sk

× exp [−βH ∗ (x)]f (µ). Finally, we make the observation that for 1st order neighbourhoods of Fig. 1 we have H ∗ (x) = 2||∇x||0 , where ||∇x||0 = ||∇h x||0 +||∇v x||0 denotes the `0 norm of the horizontal and vertical components of the 1st order discrete gradient of x, and f (x, z, µ|y, β) ∝

K Y Y

! pN (yn |xn , σ 2 )δ(xn − µk )

k=1 n∈Sk

(15)

× exp [−2β||∇x||0 ]f (µ). The graphical structure of this hierarchical Bayesian model is summarised in Fig. 3 below. Notice that in this model x separates y and σ 2 from the other model parameters, that the MRF is now enforcing spatial smoothness on x not z, and that the elements of z are prior independent. We are now ready to conduct a small-variance asymptotics analysis on (15) and derive the asymptotic MAP estimator of x, z, µ, which is defined as [4] argmin lim −σ 2 log f (x, z, µ|y, β) . 2 x,z,µ

February 6, 2015

σ →0

DRAFT

8

z:

zn

zn+1 ]

2

ρ

β

σ2

zn0



µ

z

zn0 +1

xn 

  





x:

xn+1 

xn0

xn0 +1

x 



y:

y



yn 

yn0

yn 

yn0 +1

Fig. 3. [Left:] Directed acyclic graph of the proposed Bayesian model, augmented by the auxiliary variable x decoupling µ and z from y (parameters with fixed values are represented using black boxes). [Right] Local representation of three layers of the proposed augmented model for 4 neighbouring pixels.

First, we use the fact that δ(s) = limτ 2 →0 pN (s|0, τ 2 ) to express (15) as follows f (x, z, µ|y, β) ∝ lim 2

τ →0

K Y Y

! pN (yn |xn , σ 2 )pN (xn |µk , τ 2 )

k=1 n∈Sk

× exp [−β||∇x||0 ]f (µ)  ! K Y Y (xn − yn )2 (xn − µk )2 − exp − ∝ lim τ 2 →0 2σ 2 2τ 2 k=1 n∈S

(16)

k

× exp [−β||∇x||0 ]f (µ). Then, in a manner akin to Broderick et al. [4], we allow the model’s hyper parameters to scale with σ 2 in order to preserve the balance between the prior and the likelihood and avoid a trivial limit. More precisely, we set β = β 0 /2σ 2 and assume that σ 2 vanishes at the same rate as τ 2 . Then, the limit of −σ 2 log f (x, z, µ|y) as σ 2 → 0 is given by K X X 1 lim −σ log f (x, z, µ|y) = (xn − yn )2 σ 2 →0 2 k=1 n∈S 2

k

(17)

1 + (xn − µk )2 + β 0 ||∇x||0 , 2

February 6, 2015

DRAFT

9

and the MAP asymptotic estimators of x, z, µ by K X X 1 (xn − yn )2 argmin 2 x,z,µ k=1 n∈S

(18) k 1 + (xn − µk )2 + β 0 ||∇x||0 . 2 Unfortunately (18) is still NP-hard due to ||∇x||0 . However, unlike (9), (18) can be approximated by a convex optimisation problem over RN coupled with a separable integer optimisation problem that can be split into N trivial problems over {1, . . . , K}. More precisely, we replace ||∇x||0 by the convex approximation ||∇x||1−2 and propose the following estimators of x, z, µ K X X 1 ˆ 3, z ˆ3, µ ˆ 3 = argmin (xn − yn )2 x 2 x,z,µ k=1 n∈S k

where TV(x) = ||∇x||1−2

(19)

1 + (xn − µk )2 + β 0 TV(x), 2 is the (convex) isotropic total-variation pseudo-norm of x [30].

ˆ 3, z ˆ3, µ ˆ 3 can be very efficiently computed by iteratively minimising (19) w.r.t. Notice that x x, z and µ. The minimisation of (19) w.r.t. x is equivalent to a total-variation denoising problem that can be very efficiently solved, even in high-dimensional scenarios, by using modern convex optimisation techniques (in this paper we used a parallel implementation of Chambolle’s algorithm [28]). The minimisation of (19) w.r.t. z (with x and µ fixed) is a trivial separable integer problem that can be formulated as N independent maximisation problems over 1, . . . , K. Similarly, the minimisation with respect to µ is a trivial least squares fitting problem with analytic solution. Also note that iteratively minimising (19) with respect to z and µ, with fixed x, is equivalent to solving a least squares clustering problem with the popular K-means algorithm [29]. Algo. 1 below summarises the proposed semi-supervised image segmentation algorithm based on (19), which iteratively solves a total-variation denoising problem and a least-squares clustering (K-means) problem until convergence. We note at this point that because the overall minimisation problem is not convex the solution obtained by iterative minimisation of (19) might depend on the initial values of x, z, µ. In all our experiments we have used the initialisation x(0) = 2y, z = [1, . . . , 1]T , µ = [0, . . . , 0]T that produced good estimation results. B. Adaptation of the regularisation parameter β We consider that a main contribution of this paper is to present a fully unsupervised extension of Algo. 1 where the value of the regularisation parameter β is adjusted automatically February 6, 2015

DRAFT

10

Algorithm 1 Bayesian segmentation algorithm with fixed β 0 1: Input: Image y, regularisation parameter β 0 , maximum number of iterations T . 2:

Initialise x(0) = 2y, z = [1, . . . , 1]T , µ = [0, . . . , 0]T .

3:

for t = 1 : T do

4:

Compute x(t) minimising (19), with fixed z = z (t−1) and µ = µ(t−1) , by using Chambolle’s algorithm [28].

5:

Compute z (t) and µ(t) by least-squares clustering of x(t) , using the K-means algorithm [29].

6: 7: 8: 9: 10:

if z (t) 6= z (t−1) then Set t = t + 1. else Exit to line 12. end if

11:

end for

12:

Output: Segmentation z (t) , µ(t) .

during the segmentation process. The Bayesian model and estimation technique described above requires practitioners to specify the value of β (through β 0 /σ 2 = β). However, it is well known that appropriate values for regularisation parameters can be highly image dependent and sometimes difficult to select a priori, thus requiring practitioners to set parameter values heuristically or by visual cross-validation. A powerful Bayesian strategy to circumvent this problem is to model β as an additional random variable, assign it an appropriate prior distribution f (β), define an augmented hierarchical Bayesian model that includes β within its unknown parameter vector, and finally remove β from this model by marginalisation (i.e., by integrating out β from the joint posterior density f (x, z, µ, β|y)). The resulting marginalised model can then be used for inferences of z and µ without specifying the value of β. This statistical inference strategy has been successfully applied to the hidden Potts model in [7] and demonstrated on several challenging image segmentation problems. Unfortunately, in order to include β in the model and subsequently estimate its value or marginalise it, it is necessary to compute the normalising constant of the Potts model C(β) defined in (5), which is a reputedly intractable problem. This difficulty can be addressed by using stateof-the-art Monte Carlo methods as in [7], [8], which have a very high computational cost,

February 6, 2015

DRAFT

11

or by replacing C(β) with a tractable approximation (see [7] for a detailed survey of the state-of-the-art on this topic up to 2012). In this paper we follow a pseudo-likelihood approach [31] and use the analytic approximation C(β) ∝ β −N , which enables the conjugate gamma hyper-prior f (β) = γ α β α−1 exp (−γβ)/Γ(α) with fixed parameters α and γ. We then define the joint prior f (z, β) = f (z|β)f (β), where f (z|β) is the Potts MRF (3), and obtain the augmented posterior distribution f (x, z, µ, β|y) ∝ f (y|x)f (x|z, µ)f (µ)f (z|β)f (β), which includes β as an unknown variable. The rationale for replacing the fixed regularisation parameter β by the random variable β ∼ Gamma(α, γ) is that values of α and γ can be selected such that the amount of regularisation enforced by the Potts MRF is driven by data and the impact of f (β) on the inferences is minimal, (note that appropriate values for α and γ will be specified later through a small-variance asymptotics analysis). Moreover, in order to marginalise β from the model we notice that β is conditionally independent of y given z; to be precise, that f (x, z, µ, β|y) = f (β|z)f (x, z, µ|y). Therefore integrating f (x, z, µ, β|y) with respect to β is equivalent to redefining the posterior distribution (15) with the marginal prior Z f (z, β)dβ f (z) = R+ Z β N exp (−βH ∗ (z))β α−1 exp (−γβ)dβ ∝

(20)

R+

∝ [H ∗ (z) + γ]α+N , which leads to the following (marginal) posterior distribution f (x, z, µ|y) ∝

K Y Y

! pN (yn |xn , σ 2 )δ(xn − µk )

k=1 n∈Sk

(21)

× f (µ) (||∇x||0 +γ/2)α+N , that does not depend on the regularisation parameter β. The graphical structure of this hierarchical Bayesian model is summarised in Fig. 4 below. Finally, we conduct a small-variance asymptotics analysis on (21) to derive an asymptotic marginal MAP estimator of x, z, µ. Proceeding in a similar fashion to steps (16)-(19) described above, and by allowing the hyper-parameter α to scale with σ as α = N/σ 2 to preserve the balance between the prior and the likelihood and avoid a trivial limit, we obtain

February 6, 2015

DRAFT

12

z: α, γ

ρ2

zn0



µ

z

zn0 +1



! 

xn+1 

xn0 +1 

y:

y



xn xn0

x

zn+1 ]



x:

  

σ2

zn



yn 

yn0

yn 

yn0 +1

Fig. 4. [Left:] Directed acyclic graph of the proposed Bayesian model with marginalisation of the regularisation parameter β (parameters with fixed values are represented using black boxes). [Right] Local representation of three layers of the model for 4 neighbouring pixels.

the following estimators: K X X1 1 ˆ 4, z ˆ4, µ ˆ 4 = argmin (xn − yn )2 + (xn − µk )2 x 2 2 x,z,µ k=1 n∈S k

(22)

+ N log [T V (x) + 1] , where we have set γ = 2 such that the penalty log (T V (x) + γ/2) ≥ 0. ˆ 4, z ˆ4, µ ˆ 4 can be very efficiently computed by iteratively minimising Again, the estimates x (22) w.r.t. x, z and µ. The joint minimisation of (22) w.r.t. z and µ remains unchanged and is computed by performing a least-squares (K-means) clustering on x. The minimisation of (22) w.r.t. x involves solving the non-convex optimisation problem argmin x

K X X1 k=1 n∈Sk

1 (xn − yn )2 + (xn − µk )2 2 2

(23)

+ N log [T V (x) + 1] , which was studied in detail in [32]. Essentially, given some initial condition u(0) ∈ RN , (23) can be efficiently minimised by majorisation-minimisation by iteratively solving the following sequence of convex total-variation denoising problems, v

(`+1)

K X X 1 1 = argmin (xn − yn )2 + (xn − µk )2 2 2 x k=1 n∈S k

+ β`0 T V (x) with β`0 = February 6, 2015

(24)

N . T V [v (`) ] + 1 DRAFT

13

in which β`0 plays the role of an effective regularisation parameter. The proposed fully unsupervised segmentation algorithm based on (22) is summarised in Algo. 2 below. Perhaps unexpectedly, Algo. 2, which results from removing β from the model by marginalisation, is computationally equivalent to an instance of Algo. 1 where the value of β 0 is self-adjusted within the algorithm in an adaptive manner. Again, because the overall minimisation problem is not convex the solution obtained by iterative minimisation of (19) might depend on the initial values of x, z, µ, which we have set to x(0) = 2y, z = [1, . . . , 1]T , µ = [0, . . . , 0]T in all of our experiments. In addition, the experiments reported in Section IV have been computed using L = 25, though the sequence (24) typically converged to a stable u and β`0 within the first 5 iterations. In all experiments we set  = 10−3 . IV. E XPERIMENTAL R ESULTS AND O BSERVATIONS In this section we demonstrate empirically the proposed Bayesian image segmentation methodologies with a series of experiments and comparisons with state-of-the-art algorithms. All experiments have been conducted using the algorithm parameters T = 50, L = 25,  = 10−3 , and computed on an Intel i7 quad-core workstation running MATLAB 2013a. A. Experiment 1 In the first experiment we compare the proposed supervised and unsupervised methods with other fast image segmentation techniques. We have chosen these methods because they represent different efficient algorithmic approaches to computing image segmentation (e.g. MRF energy minimisation solved by graph-cut, active contour solved by Riemannian gradient descent, and two convex models solved by convex optimisation). The specific methods used in the comparison are as follows: •

Chan-Vese active contour by natural gradient descent [22]. We compare with this natural gradient method to assess the computational speed of our approach, (to our knowledge this method is currently the fastest for all of the images we use).



Hidden Potts MRF by graph max-flow/min-cut [33]. This comparison provides an indication of the approximation error induced by the SVA and convex approximation, (recall that the graph-cut approximates computationally the exact solution of the hidden Potts MRF).



The fast global minimisation algorithm (FGMA) of [12].



The two-stage smoothing-followed-by-thresholding algorithm (TSA) of [15].

February 6, 2015

DRAFT

14

Algorithm 2 Unsupervised Bayesian segmentation algorithm 1: Input: Image y, number of maximum outer iterations T and inner iterations L, tolerance . 2:

Initialise x(0) = 2y, z = [1, . . . , 1]T , µ = [0, . . . , 0]T .

3:

for t = 1 : T do

4:

Set v (0) = x(t−1) .

5:

for ` = 0 : L do

6:

Set β`0 = N/{T V [u(`) ] + 1}.

7:

Compute v (`+1) using (24), with fixed z = z (t−1) and µ = µ(t−1) , using Chambolle’s algorithm [28].

8:

if (N/{T V [v (`+1) ] + 1} − β`0 ) ≥ β`0 then Set ` = ` + 1.

9: 10:

else Exit to line 14.

11: 12:

end if

13:

end for

14:

Set x(t) = v (L) .

15:

Compute z (t) and µ(t) by least-squares clustering of x(t) using the K-means algorithm [29].

16: 17: 18: 19: 20:

if z (t) 6= z (t−1) then Set t = t + 1. else Exit to line 22. end if

21:

end for

22:

0 Output: Segmentation z (t) , µ(t) , βef f =

N . T V [x(t) ]+1

The comparisons with the FGMA and TSA are interesting because they involve convex models that are also solved by convex optimisation, so in that sense they are similar to our method. However, FGMA and TSA solve more difficult convex programs, we only solve a total-variation denoising problem which is computationally simpler by comparison.

February 6, 2015

DRAFT

15

To guarantee that the comparisons are fair we have applied the six algorithms considered in this paper to three images with very different compositions: the synthetic shape image from [24] (which has also been used in [22]), and the “bacteria” and “lungs” images from the supplementary material of [12]. It should be noted that the images have been selected as they are composed of different types and numbers of objects; objects which have different shapes, (regular and irregular); different noise levels; and a range of potential segmentation solutions. With regards to the algorithms used for comparison with the methods outlined in the paper, when possible we have used MATLAB codes made available by the respective authors. It should be noted that these are mainly MATLAB scripts, however the graph-cut method is written in C++, ( the [34] implementation was used here), so it has a slight advantage in terms of computational performance as a consequence. We emphasise at this point that we do not seek to explicitly compare the accuracy of the methods because: 1) in the majority of images considered there is no objective ground truth; 2) the ”correct” segmentation is often both subjective and application-specific; and 3) the segmentations can often be marginally improved by fine tuning the regularisation parameters. What our experiments seek to demonstrate is that all the methods provide similar segmentation results, but that our method is generally faster than the competing algorithms considered if the regularisation parameter β 0 is fixed a priori, or is as fast but without the requirement for specifying the value of β 0 . Figures 5, 6 and 7 show the segmentation results obtained for the images “shapes”, “lungs” and “bacteria” with each method and using K = 2. For each image we provide the segmentation results obtained with Algo. 2 (where β 0 is estimated from the data) and with Algo. 1 by setting β 0 a-priori to a value that is very different from that estimated from the data. This is done to illustrate the robustness of our method to a misspecification of the regularisation parameter. It can be observed that all six methods produced similar segmentation results that are in good visual agreement with each other (however note that many other state-of-the-art methods produce significantly less accurate segmentation results, e.g. see the experiments reported in [22], [24]). Tables I, II and III report the computing times associated with these experiments. We observe that all six methods converged in a small number of iterations. In particular, Algo. 1, Algo. 2, the natural gradient descent [22] and FGMA [12] always converged in only 2 or 3 iterations. We also observe that Algo. 1 produced the fastest results, closely followed by the natural gradient descent, which is the fastest state-of-the-art method for this type of February 6, 2015

DRAFT

16

two-class image segmentation problems [22]. These methods were approximately 2 and 3 times faster than the graph-cut [33], FGMA [12] and two-stage algorithms [15]. Algo. 2 was on average 3 times slower than Algo. 1 due to the additional computations related to the non-convex program (23); however, we emphasise that this algorithm has the property of adapting automatically the value of the regularisation parameter β 0 .

0 (a) Algo. 2 (βef f = 40.3)

(b) Algo. 1 with β 0 = 25

(c) Natural grad. [22]

(d) Graph-Cut [33]

(e) FGMA [12]

(f) TSA [15]

Fig. 5. Comparison with the state-of-the-art methods [22], [33], [12], and [15] using the synthetic shape image

from [24] (216 × 187 pixels, additive Gaussian noise, SNR 5.36 dB).

February 6, 2015

DRAFT

17

TABLE I E XPERIMENT 1: C ONVERGENCE AND COMPUTING TIMES FOR THE “ SHAPES ” IMAGE DISPLAYED IN F IGURE 5.

Iterations

Comp. time (sec)

Proposed (fixed β 0 )

2

0.22

Proposed (unknown β 0 )

2

0.52

Natural gradient [22]

2

0.24

Potts-MRF via Graph-Cut [33]

n/a

0.41

FGMA [12]

3

0.45

TSA [15]

13

0.71

TABLE II C ONVERGENCE AND COMPUTING TIMES FOR “ LUNG ” IMAGE DISPLAYED IN F IG . 6.

Iterations

Computing time (seconds)

Proposed (fixed β 0 )

2

0.18

Proposed (unknown β 0 )

2

0.65

Natural gradient [22]

2

0.21

Potts-MRF via Graph-Cut [33]

n/a

0.30

FGMA [12]

3

0.32

TSA [15]

16

1.96

B. Experiment 2 An important property of the proposed method is that it can be applied to problems with more than two classes (as opposed to the natural gradient descent method from [22] which was specifically designed for K = 2). In this second experiment we apply the proposed methods to one slice of a 3D in-vivo MRI image of a human brain (256 × 256 pixels), which is depicted in Fig. 8(a). Note that this image is composed by three biological tissues (cerebro spinal fluid, white matter, and grey matter) with complex shapes and textures, making the segmentation problem challenging. The segmentation obtained with our method with unknown β 0 and K = 3 is depicted in Fig. 8(b), and with β 0 = 1.0 and K = 3 in Fig. 8(c). These result have been computed in only 2 iterations (0.23 seconds) and 2 iterations (0.16 seconds) respectively, confirming that the proposed algorithms can produce accurate

February 6, 2015

DRAFT

18

0 (a) Algo. 2 (βef f = 0.09)

(b) Algo. 1 with β 0 = 1.0

(c) Natural grad. [22]

(d) Graph-Cut [33]

(e) FGMA [12]

(f) TSA [15]

Fig. 6. Comparison with the state-of-the-art methods [22], [33], [12], and [15] using the lung image (336 × 336

pixels) from the supplementary material of [12].

segmentation results in very few iterations. By comparison, Fig. 8(d) depicts the segmentation results obtained by full Bayesian inference with a Gauss-Potts MRF model with unknown means (µ1 , µ2 , µ3 ) and variances (σ12 , σ22 , σ32 ), as well as unknown regularisation parameter β.

February 6, 2015

DRAFT

19

0 (a) Algo. 2 (βef f = 0.15)

(b) Algo. 1 with β 0 = 10

(c) Natural gradient [22]

(d) Graph-Cut [33]

(e) FGMA [12]

(f) TSA [15]

Fig. 7. Comparison of the supervised and unsupervised methods with the state of the algorithm [22], [33], [12]

and [15] using the bacteria image (380 × 380 pixels) from the supplementary material of [12].

Note that the segmentation of Fig. 8(d) corresponds to the marginal maximum a-posteriori estimators of each hidden label z1 , . . . , zn (after marginalising with respect to all the other unknown parameters), which we computed with the state-of-the-art Markov chain Monte

February 6, 2015

DRAFT

20

TABLE III C ONVERGENCE AND COMPUTING TIMES FOR “ BACTERIA” IMAGE DISPLAYED IN F IG . 7.

Iterations

Computing time (seconds)

Proposed (fixed β 0 )

2

0.24

Proposed (unknown β 0 )

2

0.80

Natural gradient [22]

2

0.18

Potts-MRF via Graph-Cut [33]

-

0.30

FGMA [12]

2

0.47

TSA [15]

19

3.02

Carlo algorithm [7]. The computation of these marginal estimators is very expensive and as a result required 152 seconds. It is worth noting that in this experiment the variances of each class were very different (σ1 6= σ2 6= σ3 ) and that their values are very large (σ1 ≈ 21, σ2 ≈ 10, σ3 ≈ 39). So in this example the small-variance assumption is not accurate, yet the segmentation results are sensible, suggesting that our method is robust to heteroscedasticity and to large variance. V. C ONCLUSIONS We have presented a new approximate Bayesian estimator for hidden Potts-Markov random fields. The estimator is based on a small-variance-asymptotic analysis of an augmented Bayesian model and a convex relaxation. This estimator can be very efficiently computed by using an alternating direction scheme based on a total-variation denoising step and a leastsquares (K-means) clustering step, both of which can be computed straightforwardly, even in large 2D and 3D scenarios, and with parallel computing techniques. The resulting new image segmentation methodology converges extremely fast and produces accurate segmentation results in only few iterations.We then presented an extension of this technique for the case where the value of the spatial regularisation parameter is unknown and removed from the model by marginalisation. This led to a new fully unsupervised fast Bayesian segmentation algorithm in which the strength of the spatial regularisation is adapted automatically to the observed image during the inference procedure. A detailed analysis of the theoretical properties of small-variance-asymptotic estimators in general, and in particular of the methods described in this paper, is currently under investigation. Potential future research topics include the February 6, 2015

DRAFT

21

0 (b) Algo. 2 (βef f = 0.11)

(a) Brain MRI

(c) Algo. 1 with β 0 = 1.0 Fig. 8.

(d) ABC-MCMC [7]

Segmentation of a brain MRI image (256 × 256 pixels).

extension of these methods to non-Gaussian statistical models from the exponential family and their application to ultrasound and PET image segmentation, extensions to models with unknown number of classes K, and comparisons with other Bayesian segmentation methods based on alternative hidden MRF models that can also be solved by convex optimisation, such as [16]. R EFERENCES [1] O. Eches, N. Dobigeon, and J.-Y. Tourneret, “Enhancing hyperspectral image unmixing with spatial correlations,” IEEE Trans. Geoscience and Remote Sensing, vol. 49, no. 11, pp. 4239–4247, Nov. 2011. [2] M. Pereyra, N. Dobigeon, H. Batatia, and J.-Y. Tourneret, “Segmentation of skin lesions in 2D and 3D ultrasound images using a spatially coherent generalized Rayleigh mixture model,” IEEE Trans. Med. Imag., vol. 31, no. 8, pp. 1509–1520, Aug. 2012. [3] T. Vincent, L. Risser, and P. Ciuciu, “Spatially adaptive mixture modeling for analysis of fMRI time series,” IEEE Trans. Med. Imag., vol. 29, no. 4, pp. 1059 –1074, April 2010.

February 6, 2015

DRAFT

22

[4] T. Broderick, B. Kulis, and M. I. Jordan, “MAD-Bayes: MAP-based asymptotic derivations from Bayes,” Journal of Machine Learning Research, vol. 28, no. 3, pp. 226–234, 2013. [5] A. Roychowdhury, K. Jiang, and B. Kulis, “Small-variance asymptotics for hidden Markov models,” in Advances in Neural Information Processing Systems 26, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., 2013, pp. 2103–2111. [Online]. Available: http://media.nips.cc/nipsbooks/nipspapers/paper files/nips26/1045.pdf [6] P. J. Green, “MAD-Bayes matching and alignment for labelled and unlabelled configurations,” in Geometry driven statistics, I. L. Dryden and J. T. Kent, Eds.

Chichester: Wiley, 2015, ch. 19, pp. 365–375.

[7] M. Pereyra, N. Dobigeon, H. Batatia, and J.-Y. Tourneret, “Estimating the granularity parameter of a Potts-Markov random field within an MCMC algorithm,” IEEE Trans. Image Process., vol. 22, no. 6, pp. 2385–2397, June 2013. [8] M. Pereyra, N. Whiteley, C. Andrieu, and J.-Y. Tourneret, “Maximum marginal likelihood estimation of the granularity coefficient of a Potts-Markov random field within an MCMC algorithm,” in Statistical Signal Processing (SSP), 2014 IEEE Workshop on, June 2014, pp. 121–124. [9] C. McGrory, D. Titterington, R. Reeves, and A. Pettitt, “Variational Bayes for estimating the parameters of a hidden Potts model,” Statistics and Computing, vol. 19, no. 3, pp. 329–340, Sept. 2009. [10] G. Celeux, F. Forbes, and N. Peyrard, “EM procedures using mean field-like approximations for Markov model-based image segmentation,” Pattern Recognition, vol. 36, no. 1, pp. 131 – 144, Jan. 2003. [11] F. Forbes and G. Fort, “Combining Monte Carlo and mean field like methods for inference in hidden Markov random fields,” IEEE Trans. Image Process., vol. 16, no. 3, pp. 824–837, March 2007. [12] X. Bresson, S. Esedoglu, P. Vandergheynst, J.-P. Thiran, and S. Osher, “Fast global minimization of the active contour/snake model,” J. Math. Imaging Vis., vol. 28, no. 2, pp. 151–167, June 2007. [13] V. Kolmogorov, M. Heath, I. Re, P. H. S. Torr, and M. Wainwright, “An analysis of convex relaxations for MAP estimation of discrete MRFs,” Journal of Machine Learning Research, pp. 71–106, 2009. [14] N. Komodakis, N. Paragios, and G. Tziritas, “MRF energy minimization and beyond via dual decomposition,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 33, no. 3, pp. 531–552, March 2011. [15] T. Z. Xiaohao Cai, Raymond H. Chan, “A two-stage image segmentation method using a convex variant of the Mumford-Shah model and thresholding,” SIAM J. Imaging Sci., vol. 6, no. 1, pp. 368–390, Aug. 2013. [16] J. Bioucas-Dias, F. Condessa, and J. Kovacevic, “Alternating direction optimization for image segmentation using hidden Markov measure field models,” in IS&T/SPIE Electronic Imaging.

International Society for Optics and

Photonics, Feb. 2014, pp. 90 190P–90 190P. [17] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 23, p. 2001, 2001. [18] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?” IEEE Trans. Patt. Anal. Mach. Intell., vol. 26, pp. 65–81, 2004. [19] V. Kolmogorov, “Convergent tree-reweighted message passing for energy minimization,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 10, pp. 1568–1583, Oct 2006. [20] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” Int. J. Computer Vision, vol. 70, no. 1, pp. 41–54, 2006. [21] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother, “A comparative study of energy minimization methods for Markov random fields with smoothness-based priors,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 30, no. 6, pp. 1068–1080, 2008. [22] M. Pereyra, H. Batatia, and S. McLaughlin, “Exploiting information geometry to improve the convergence properties of variational active contours,” IEEE J. Sel. Topics Signal Processing, vol. 7, no. 4, pp. 1–8, Aug. 2013.

February 6, 2015

DRAFT

23

[23] ——, “Exploiting information geometry to improve the convergence of nonparametric active contours,” IEEE Trans. Image Process., vol. 1, pp. 1–10, 2015. [24] L. Bar and G. Sapiro, “Generalized Newton-type methods for energy formulations in image processing,” SIAM J. Imaging Sciences, vol. 2, no. 2, pp. 508–531, 2009. [25] G. Sundaramoorthi, A. Yezzi, A. Mennucci, and S. G., “New possibilities with sobolev active contours,” International Journal of Computer Vision, vol. 84, no. 2, pp. 113–129, May 2009. [26] K. Jiang, B. Kulis, and M. I. Jordan, “Small-variance asymptotics for exponential family Dirichlet process mixture models,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 3167–3175. [27] F. Y. Wu, “The Potts model,” Rev. Mod. Phys., vol. 54, no. 1, pp. 235–268, Jan. 1982. [28] A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imaging Vis., vol. 20, no. 1-2, pp. 89–97, 2004. [29] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1.

University of California Press, 1967, pp. 281–297.

[30] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D, vol. 60, no. 1-4, pp. 259–268, Nov. 1992. [31] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. Roy. Stat. Soc. Ser. B, vol. 48, no. 3, pp. 259–302, 1986. [32] J. Oliveira, J. Bioucas-Dias, and M. Figueiredo, “Adaptive total variation image deblurring: A majorizationminimization approach,” Signal Process., vol. 89, no. 9, pp. 1683–1693, 2009. [33] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision.” IEEE transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124–1137, September 2004. [34] S. Bagon, “Matlab wrapper for graph cut,” December 2006. [Online]. Available: http://www.wisdom.weizmann.ac.il/ ∼bagon

February 6, 2015

DRAFT