Enforcing local context into shape statistics - UCLA Vision Lab

Report 0 Downloads 123 Views
Adv Comput Math DOI 10.1007/s10444-008-9104-5

Enforcing local context into shape statistics Byung-Woo Hong · Stefano Soatto · Luminita A. Vese

Received: 23 October 2007 / Accepted: 1 October 2008 © Springer Science + Business Media, LLC 2008

Abstract The paper presents a variational framework to compute first and second order statistics of an ensemble of shapes undergoing deformations. Geometrically “meaningful” correspondence between shapes is established via a kernel descriptor that characterizes local shape properties. Such a descriptor allows retaining geometric features such as high-curvature structures in the average shape, unlike conventional methods where the average shape is usually smoothed out by generic regularization terms. The obtained shape statistics are integrated into segmentation as a prior knowledge. The effectiveness of the method is demonstrated through experimental results with synthetic and real images. Keywords Shape descriptor · Shape statistics · Variational framework · Segmentation Mathematics Subject Classification (2000) 68T45

Communicated by Lixin Shen and Yuesheng Xu. For any questions regarding this manuscript, please contact the corresponding author Byung-Woo Hong. B.-W. Hong (B) School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea e-mail: [email protected] S. Soatto Computer Science Department, University of California, Los Angeles, CA, USA e-mail: [email protected] L. A. Vese Department of Mathematics, University of California, Los Angeles, CA, USA e-mail: [email protected]

B.-W. Hong et al.

1 Introduction Our goal is to compute statistics of an ensemble of shapes that capture their first-order (average) and second-order (deviation) properties. These can be used to define and learn a prior model on the set of shapes, which in turn is key in applications such as image registration, segmentation, and shape classification. The process is illustrated pictorially in Fig. 1 and is important in medical image analysis and computational anatomy [18]. Along the way, we seek for a way to establish “meaningful” correspondence between any pair of shapes, where “meaning” entails respect of the local shape context as we will make precise shortly. We restrict our attention to planar shapes, represented as planar, closed, simple curves, not necessarily smooth, although some of the ideas naturally extend to three dimensional embedded surfaces. There are many ways to represent shapes, and to endow them with a metric and probabilistic structure. The simplest is a (finite-dimensional) collection of points modulo the action of a finite-dimensional group. The ensuing theory has been nicely worked out (see [13] and references therein) and the inference algorithms are simple and efficient, but the power of these methods is severely limited in applications where there are no well-defined “landmark points,” or there is no time to manually select them, as is the case in the analysis of massive data sets in computational anatomy. Infinite-dimensional representations of shapes as templates (grayscale or binary images representing continuous closed curves) under the action of infinite-dimensional domain deformations were pioneered by Grenander [17]. The basic premise is that each object of interest (shape) is obtained from a representative element (template) through the transitive action of the deformation group (under the composition operation). That is, given a template, any other shape can be “reached” by applying a suitable domain deformation. The set of shapes can then be endowed with a probability structure by placing a measure on the deformation group, which allows defining a “mean,” a “covariance function” and a Gibbs-type (Langevin) distribution on shapes. Alternatively, one can define a (cord) distance and define a notion of “average shape” as the element (itself a shape) that minimizes the distance from the ensemble. Such notions of “average” or “mean,” however, are not

Fig. 1 a Examples of 10 overlapped corpus callosum shapes. b Shape distribution with mean (black line) and deviation (dots with colors). c Segmentation result of corpus callosum in a MRI image using shape information

(a)

(b)

(c)

Enforcing local context into shape statistics

well-defined in absolute terms, since the distance between any two shapes can be made arbitrarily small by a transitive action of the group. To make these notions well-posed one has to introduce regularization on the set of possible domain deformations, and the resulting statistics depend crucially on the choice of regularizer [7, 27]. In [40] the deformation is separated into a finite-dimensional group that acts at zero cost, and a diffeomorphism, whose energy is minimized. Still the resulting average is an expression of the choice of regularizer, which destroys important shape information present in the ensemble data. The problem of a meaningful shape average is bypassed in [6] by defining second-order statistics directly from pairwise distances between any two elements of the ensemble. However, first-order statistics are important in many application domains, so the problem of defining a meaningful average remains. All these difficulties arise due to the infinite-dimensionality of the representation, and are not present in the finite-dimensional case. The assumption that any shape can be reached by a domain deformation, and the absence of local context information in the representation makes it such that any point in a template can be transformed to any other point in a target shape, and the determination of correspondence is left to generic regularization terms such as elasticity priors. Another issue that is often overlooked in the existing literature is the fact that correspondence, and hence ensemble statistics, are only meaningful when computed relative to a scale, so one is left with tweaking the amount of smoothness in the regularizer by guessing the amount of “noise” in the data. In this paper, we introduce the use of local shape context in the computation of first and second-order ensemble statistics. This is, to the best of our knowledge, novel, and constitutes our contribution. We exploit existing scale-spaces of local features [20] to capture the local shape context. We exploit the resulting notion of “meaningful” correspondence [20] to define and efficiently compute ensemble statistics: we introduce a notion of “meaningful mean” and “sample covariance” of the ensemble, and exploit the linearity of the deformation field to perform principal component analysis. These statistics can be used to define a normal distribution in the set of shapes, which can be used as a prior for segmentation, registration, or classification. Our contribution is the introduction of these statistics, and simple algorithms to compute them, and to infer their values from the ensemble. Although the segmentation with shape prior is not the main focus of our work, we also show applications of our integral shape features and average shapes to this problem. There has been a lot of work on segmentation with shape prior. We wish to mention here those most relevant or related to our approach. These can be divided into two categories: those proposed for general applications in computer vision, and those geared specifically towards medical applications. In the first category on general techniques, we wish to mention the work of Bresson et al. [2], which combines three terms, the geodesic

B.-W. Hong et al.

active contours, the piecewise-smooth Chan-Vese segmentation model, and a shape prior based on PCA. In the earlier work of Cootes et al. [9], shapes are represented by points on the boundary, mean shapes are obtained and statistics and PCA techniques are used on the points positions, obtaining point distribution models (PDM); an initial shape in the parameter space is moved towards boundaries in the image using an edge function and motion in the normal direction. In the work of Dambreville et al. [11], kernel PCA is used for the shape prior, and this is combined with a binary segmentation criterion based on Chan-Vese segmentation, as in the present paper. In the recent work of Kim et al. [21], the binary Chan-Vese segmentation is also combined with shape prior information; the shapes are defined by signed distance functions and Parzen density estimates are used, after pose alignment of shapes. In the category of segmentation with shape prior for medical imaging, we wish to mention Davatzikos et al. [12], where aligned training shapes represented by coordinates of points (landmarks) are used, which are then transformed in the wavelet domain; then PCA is applied in the wavelet domain, and segmentation by local motion of landmarks is obtained. The work of Kohlberger et al. [22] is applied to segmentation with shape prior for 4D left myocardium images, using level sets, the binary Chan-Vese segmentation model and PCA (a prior alignment of the shapes is also performed). In the work of Lecellier et al. [23], a region-based segmentation similar with [4] and the shape prior is imposed by minimizing the distance between the moments of the prior and unknown shape. In the work of Taron et al. [41], signed distance functions are used to represent the shapes, that are registered and segmented through thin plate transformation in three dimensions. In the work of Yan and Kassim [44], shapes are again represented by signed distance functions, shape variations are measured by performing PCA and SVD, and MAP shape estimation and deformable minimal path is used for the segmentation. We would like to mention that our work is different from all above mentioned work. We propose here to use integral kernels applied to characteristic functions to discriminate shapes (which encodes shape information and it is different from using signed distance functions for instance); this is combined with binary segmentation method [4] as in other work; however, we also use a deformation field as an unknown in the segmentation and registration approach. We do not claim that our method is superior from all other methods. We propose a different approach for the shape representation and illustrate its advantages and its applicability in practice. This is a continuation and extension of our prior preliminary work [20]. In the next subsections we will recall existing results on local features (Section 1.1) and meaningful correspondence (Section 1.3) in order to introduce our notation and terminology and make the paper self-contained. In the next two sections we introduce the first-order (Section 2) and secondorder statistics (Section 3), and integrate them into segmentation as a prior knowledge (Section 4). Then, we will illustrate their use in experiments on real examples (Section 5).

Enforcing local context into shape statistics

1.1 Shape representation using integral kernels While features are any statistics of the image, not necessarily sufficient ones, a representation is a bijective function of the data and can be used to retrieve it, possibly up to a class of transformations (i.e. it contains the “same information”). Examples of direct representations of shape include parameterized curves describing the boundary, which are not intrinsic and present obvious problems. An alternative approach consists of representing shapes by maps, i.e. functions defined from R2 into R. Such representations, of which the Level Set framework [25, 29] is a prototype, are simpler to handle and better adapted to segmentation. For example, the binary image itself is a representation (1 inside a closed, bounded and connected domain D, 0 outside). Another celebrated choice is the signed distance function to the contour of the shape, d(x) = (+ or −)dist(x, ∂ D); d(x) < 0 inside ∂ D and d(x) > 0 outside ∂ D. Its computation is based on its characterization as the unique viscosity solution d(x) of the nonlinear partial differential equation (PDE) |∇d(x)| = 1 verifying d(x) = 0, ∀x ∈ ∂ D, d(x) < 0, for all x inside of ∂ D, d(x) > 0, for all x outside of ∂ D. These two representations (binary image and signed distance function) have been used successfully for registration and segmentation [5, 10, 31, 32]. Nevertheless, these two representations suffer from the lack of characterization with respect to local geometry of the shape when dealing with deformable shape priors for segmentation. In order to address these shortcomings we introduce in this section a new representation of shape together with a notion of feature that characterizes geometrical properties of the shape up to a given scale. We first give the necessary notations and definitions. By (planar) shape we mean a closed, bounded and connected region D ⊂ R2 with finite perimeter (we focus here on regions D without holes), represented by a characteristic function  S(x) = S D (x) (= χ D (x)) =

1 if x ∈ D, 0 if x ∈ D

(1)

defined for x ∈  ⊂ R2 , with D ⊂ . This can be visualized as a binary image, where  is the rectangular image domain, as in Fig. 2a. By feature we mean any statistic of the data, that is any deterministic function(al) of S. In particular, we consider multi-scale features F defined by linear functionals with a family of kernels K indexed by a scale σ : Specifically, σ ∈ R+ , K : R2 × R+ → R+ ; (x, σ ) → Kσ (x). For instance, we consider the isotropic Gaussian kernel

Kσ (x) =

|x|2 1 √ e− 2σ 2 . σ 2π

B.-W. Hong et al.

(b ) σ = 7

(a)

(c) σ = 15

(d) σ = 55

Fig. 2 a Schematic illustration for the calculation of shape feature. b–d Examples of the shape . feature Rσ (x|S) = S(x) (Kσ ∗ (1 − S(x))) of a with different scales (image size is 200 × 200)

For example, a linear feature is a functional Fσ : L1 () → L1 (R2 ); S(x)|x∈ → . Kσ ∗ S(x) = Fσ (x|S) where the convolution is defined by  Kσ ∗ S(x) = Kσ (x − y)S(y)dy, ∀ x ∈ R2 . (2) 

Notice that the feature changes the domain of definition of S, although for kernels with finite support and  is sufficiently bigger than D this is inconsequential. Similar features have been used before in the literature by Thompson et al. [33] and Manay et al. [26], among others [16, 19, 37]. It should be noticed that in the limit where σ → 0 they constitute a proper scale-space that can be related to the curvature of the boundary of D, ∂ D, albeit it does not entail computation of derivatives (hence the name “integral invariant signatures” sometimes used for Fσ (x|S)). However, the scale-space representation is only in the limit, and in general this feature destroys the boundary information present in S. For these reasons, we prefer to work with a non-linear feature designed to retain boundary information, given by one of the following two expressions, Rσ : L1 () → L1 (R2 )

. S(x) → Rσ (x|S) = S(x) (Kσ ∗ (1 − S(x))) ,

(3)

or a symmetrized one Rσ : L1 () → L1 (R2 )

. S(x) → Rσ (x|S) = S(x) (Kσ ∗ (1 − S(x))) + (1 − S(x))(Kσ ∗ S(x)). (4)

These features may seem complicated at first, but are indeed quite intuitive as illustrated in Fig. 2a, and the results with different scales using the first (simpler) feature are shown in (b)-(d). Shape representation using the second (symmetric) feature and comparison with the signed distance function are shown in Fig. 3. Figure 3b shows the values of our (symmetric) shape representation for a bow-tie. Figure 3c shows the signed distance function for the same bow-tie as a comparison. In these illustrations, we can see that our shape features encode more information than in the binary representation: it includes the original boundary information as in the binary representation,

Enforcing local context into shape statistics

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

-100

-50

(b) σ = 30

(a) Binary image

0

50

Signed distance

Fig. 3 Comparison of shape representations for a bow-tie shape: a the binary image (image . size is 300 × 444). b Our symmetric shape representation Rσ (x|S) = S(x) (Kσ ∗ (1 − S(x))) + (1 − S(x))(Kσ ∗ S(x)) with σ = 30. c The signed distance function to the boundary

since we use S(x) also in its definition, but it also explicitly contains local shape information (up to a scale σ ): corners for instance (where the curvature is high) are now distinguished or discriminated from straight segments of zero curvature on the boundary, and this information is propagated inside the shape, thus it is strongly encoded in the definition (see for instance Fig. 2b or in Fig. 3b). Also, convex regions can be distinguished or discriminated from concave regions. In the binary segmentation or in the signed distance function, additional calculations have to be done, such as computation of curvature and other derivatives, to distinguish such features. Our shape representation enjoys several desirable properties: it is robust to “noise”: Fig. 4 shows that our shape representation is insensitive to lack of differentiability of the contour, unlike curvature or the normal vectors to the signed distance function. This is due to the fact that regularization is implicit in our representation. Our representation “propagates” shape information inside and outside the boundary since its value, unlike that of a binary image, depends on the local geometry. Moreover, our shape representation is fast and easy to compute. The calculation of the shape representations is based on convolution

0.2

0.4

0.6

(a)

0.8

-3

-2

-1

0

(b)

1

2

3

-3

-2

-1

0

1

2

3

(c)

Fig. 4 Comparison of shape representations for a noisy bow-tie, by computing the angle of gradient at every point: a The binary image (image size is 300 × 444). b Normal directions to . the integral kernel representation Rσ (x|S) = S(x) (Kσ ∗ (1 − S(x))) + (1 − S(x))(Kσ ∗ S(x)) using σ = 30. c Normal directions to the signed distance function

B.-W. Hong et al.

and summation operations (easier than computing other popular approaches, for instance the signed distance function). In the derivation of the EulerLagrange equation for the energy functional based on the shape representation, the Gaussian kernel provides computational convenience. Naturally, other kernels can be used instead (not only the Gaussian), and our choice is dictated by mathematical convenience. Note also that, inside the shape, our representation is the solution of the heat equation with initial data the binary image S(x) ∂u = x u, u(x, 0) = S(x). ∂σ This establishes a connection with the work of Gorelick et al. [16] which proposes representing shapes as the solution of a Poisson equation. Unlike [16] and other work based on the signed distance function, we do not solve any PDE for designing our representation, but use the convolution of the kernel on the binary image. Finally, our shape representation captures the “context” of the given shape, in the sense that the value of the feature at a point is a local statistic of the shape in a neighborhood of that point. This can be thought of as extensions of ideas of [1] to infinite-dimensional representations, albeit for complexity reasons only the first order statistic (mean relative to the kernel K) is computed, rather than the entire distribution (discrete histogram) as in [1]. We will show in the next section that our shape representation using integral kernels, which is simpler to compute than the signed distance function, has advantages for matching shapes over the binary representation. 1.2 Domain deformations and distance between shapes via shape matching Given two shapes of the same topology, defined by functions S1 , S2 :  → {0, 1}, under the assumptions above it is possible to transform one into the other by a warping, that is a domain deformation h :  → R2 such that h() =  and S1 (x) = S2 (h(x)), ∀ x ∈ .

(5)

In deformable templates, h is chosen to belong to a space of transformations that act on the set of shapes, so that it is possible to transform any shape (template) into any other (target), that is to achieve the equal sign in (5). The distance between shapes is defined as the energy of such a warping. There are infinitely many warpings that satisfy (5), so the distance is defined as the one that minimizes such an energy in a suitably chosen class [17], for instance . d(S1 , S2 ) = minh h subject to (5), where h is a diffeomorphism and · is some chosen norm of integral form. This can be rephrased as a variational problem, for instance  d(S1 , S2 ) = inf |S1 (x) − S2 (h(x))|dx + α Ereg (h), (6) h∈H



Enforcing local context into shape statistics

where H is a suitable function space. The first term in the above sum, which we indicate with Edata (S1 , S2 |h), describes the data fidelity, that is how well the warped template fits the target, whereas the second is a regularization term (equivalent with a norm · of the space H) to render the problem well posed; α can be interpreted as a Lagrange multiplier. The regularization term, which we indicate with Ereg (h), can be taken for example from linear elasticity [8] ⎧ ⎫  ⎨ 2 ⎬  1 Ereg (h) = ( ij(h))2 dx, λ(div h)2 + 2μ (7) ⎭ 2 ⎩ i, j=1 where λ, μ > 0 are the Lamé coefficients of the material under deformation, and 1 div h = (h1 )x1 + (h2 )x2 , ij(h) = (hi )x j + (h j)xi . 2 On the boundary of the domain , we impose h(x) = x as boundary conditions. 1.3 “Meaningful” correspondence A distance between shapes, as defined above, implicitly determines a correspondence between them: a point x in the interior of the set that defines S1 , that is S S1 (x) = 1, is mapped to a point h(x) in the interior of the set that defines S2 , S S2 ◦ h(x) = 1; similarly for the exterior, or background, and the boundary of such a set. Since the data fidelity term can be made zero by infinitely many warpings h, it alone does not establish a unique correspondence, which is therefore determined by the choice of regularization term Ereg . In this sense we say that the correspondence determined by a generic regularizer (one depending only on h), such as (7), is meaningless: choose a different regularizer, you get a different correspondence even for the same data. One would like a meaningful correspondence to map corners to corners, and straight segments to straight segments. But the regularizer is independent of the data, it does not “see” corners or straight edges, and so the resulting correspondence. The same considerations apply to registering grayscale templates, rather than binary ones [27]. The lack of meaning in correspondence is due to the basic premise of deformable templates, that the group is infinite-dimensional and acts transitively on the template. The problem disappears when the group is finite-dimensional, as in [13], or when the data is not in the same orbit1 of the template [40]. Since we want to maintain the flexibility that comes from an infinite-dimensional warp h (we do not want to constrain it to be finite-dimensional, or to discretize it at the outset), we need to either make the regularizer “smart,” i.e. dependent

1 Each template S defines an orbit via the group h, S ◦ h, that has the structure of an equivalence class.

B.-W. Hong et al.

on the data S1 , S2 , or to change the data fidelity term Edata so that it can support a meaningful notion of correspondence.2 We opt for this second approach, where the “meaning” of correspondence is given by the notion of local context, captured by the feature defined in (3) or (4). For instance, if the value of the feature at a point x ∈  of the template is “distinctive,” in the sense that no other point has the same value, it can be uniquely matched on the target. For these “distinctive” points, correspondence is determined by the data fidelity term. The regularizer is only needed to fill in loci of points that have constant feature value, i.e. where the feature is uninformative. Following this rationale, we introduce a new measure of data fidelity between two shapes S1 and S2 as  Eshape (h|S1 , S2 ) = |Rσ (x|S1 ) − Rσ (h(x)|S2 )|2 dx (8) 

and define their meaningful correspondence relative to the feature Rσ with the minimizer of the energy 

h∗ = arg min Eshape (h|S1 , S2 ) + α Ereg (h) , h

where Ereg is a generic regularizer independent of the data, such as (7). Using the shape feature (4), the minimization of Eshape (h|S1 , S2 ) + α Ereg (h) with respect to h is performed in a variational framework using a gradient descent method. The Euler-Lagrange equation corresponding to this energy yields the gradient direction for h, ∂ Eshape ∂ Ereg ∂h =− −α , ∂t ∂h ∂h with each term given by

∂ Eshape = ∇ S2 ◦ h · (Rσ (x|S1 ) − Rσ (h(x)|S2 )) · (Kσ ∗ (2S2 ◦ h − 1)) ∂h  + Kσ ∗ ((Rσ (x|S1 ) − Rσ (h(x)|S2 )) · (2S2 ◦ h − 1)) , ∂ Ereg = − μ h + (λ + μ)∇(div h) . ∂h Given two shapes S1 and S2 with different scales, they need to be optimized in the process of matching and the gradient descent for the scale matching can be derived easily. Shape matching results for the case of the bow-tie are shown in Fig. 5 where target shape is shown to be perfectly matched with the deformed source shape. The robustness of our representation is illustrated in Fig. 5 (second row), where the deformed bow tie is matched successfully in the proper scale, that preserves the bow tie shape while neglecting noise.

2 Furthermore,

the notion of correspondence is only meaningful if it is associated to a notion of scale, as argued in [26].

Enforcing local context into shape statistics

Fig. 5 Shape matching and correspondence results for the deformed bow-tie (with and without noise), with adapted scale parameter σ . [First row] (left) Deformed bow-tie (target); (middle) deformed bow-tie superimposed to the original bow-tie (source); (right) overlap of the warped source and the target after the matching. [Second row] (left) Deformed “noisy” bow-tie (target); (middle) deformed noisy bow-tie superimposed on the original bow-tie (source); (right) overlap of the warped source and the target after the matching. [Third row] Displacement vector field representing the diffeomorphism for the matching between (left) deformed bow-tie, (right) deformed noisy bow-tie and the original bow-tie. [Fourth row] Dense correspondences along the boundary obtained by the matching between shapes (left) deformed (right) deformed with noise. In the experiments, σ = 7 for the deformed shape and σ = 14 for the noisy deformed shape, and α = 0.005, μ = 1, λ = 0 are used and image size is 192 × 256. Notice that larger scale σ was necessary in the correspondence matching of the noisy shape, while preserving a smooth diffeomorphism

In order to illustrate what we mean by “meaningful” correspondence and also that our proposed shape feature is superior to the binary representation, in Figs. 6 and 7 we show the correspondences for shapes that have distinct “parts” (by the matching method presented in this section with the symmetric shape feature (4), and by the method using binary representation (for Fig. 6 only presented in Section 1.2, both with the same regularizer coming from linear elasticity and solved using gradient descent). Ideally, we want our algorithm to put them into correspondence, but without conducting a semantic analysis

B.-W. Hong et al. Fig. 6 Correspondence and matching results of rectangles with moving spikes (comparison with binary representation). [First row] (left) Rectangle with a spike on a side; (middle) rectangle with a spike moved to the right; (right) overlap of the two shapes. [Second row] Correspondences along the boundary based on (left) the kernel representation and (4); (right) the binary representation and (6). [Third row] Displacement vector field formed by the optimal diffeomorphism based on (left) the kernel representation and (4); (right) the binary representation and (6). [Fourth row] Dense correspondences along the boundary obtained by the optimal diffeomorphism based on (left) the kernel representation and (4); (right) the binary representation and (6). In the experiments, σ = 7, α = 0.0005, μ = 1, λ = 0 are used and image size is 128 × 160

of each shape. For instance, in the case of a rectangle with a spike, even though there is no “right or wrong” way to move the spike (the spike could disappear from the left figure and reappear slightly displaced on the right; or, the spike could simply translate to the right between the two figures), we would like the tip of the spike in one figure to correspond to the tip of the spike in the other. For the case of a human figure, we would like the tip of various limbs to correspond in the two figures. Our representation of shape enables this behavior, since it encodes a local scale-space of curvature (the local shape feature) and attempts to match regions with similar local features to each other. This can be seen qualitatively in Fig. 6. We also notice that, although the chosen regularizer based on linear elasticity may not allow in general smooth large deformations without regridding, still using our shape feature a smoothed displacement field has been obtained (unlike in the case based on the binary representation, where the deformed grid is not smooth, see Fig. 6). Moreover, our representation of shapes includes and generalizes

Enforcing local context into shape statistics Fig. 7 Correspondence and matching results of human body with different hand and leg positions (multiple scales used in a coarse-to-fine manner). [Top row] Silhouettes of two different human bodies. [Bottom row] (left) Superimposed images of the two human body silhouettes. (right) Correspondences. In this matching experiment, three scales σ = 12, 10, 7 are used in a coarse-to-fine manner and α = 0.0005, μ = 1, λ = 0. Image size is 128 × 160. The shape representation used enables different “parts” to be put in correspondence, without performing explicit shape analysis

the binary representation. In addition, especially for more complex shapes, we have the possibility of using several scales σ in a coarse-to-fine manner, as is it done in Fig. 7. This helps to avoid the algorithm getting trapped into a nondesirable local minimum. In the next section we will show how this definition can be used to compute meaningful shape statistics. 1.4 Noise and violations of the model The premise of deformable templates that “each target can be reached from the template by the action of a domain deformation” is challenged when the data (images or shapes) are corrupted by noise (e.g. segmentation errors for planar shapes, quantization of the pixel grid) or by unmodeled phenomena (e.g. specularities in grayscale images, occlusions), since the modeling power of the infinite-dimensional group transformation is wasted in over-fitting [17]. This issue is addressed by regularizing the warpings, again at the expense of making the final result crucially dependent on the choice of regularizer. In our formulation of the problem, where local context features are matched, rather than binary templates, this issue is attacked on two fronts. First, the choice of feature can be made to reduce the effects of specific noise sources, for instance pixelization or irregularity of the contour [26]. Second, matching is associated with a scale σ , and therefore one can adaptively search for the best scale in matching pairs. Examples of feature values with varying scales are illustrated in Fig. 2. Note that σ can be thought of as a regularization parameter, but one that is adapted to the data, rather than being a generic smoothness term.

B.-W. Hong et al.

2 Shape averaging Given an ensemble of shapes {S1 , S2 , · · · , Sn }, one of the simplest statistics of practical interest is their average shape3 defined as yet another shape M that is on average closest to the ensemble. In other words, we look for M that minimizes n 

d(M, Si ).

i=1

 If we chose as distance d(M, S) = minh  |M(x) − S(h(x))|dx, without a regularizer, the mean would not be well-defined, because such a distance can be made zero for any shape M.4 If we regularize the distance, the resulting average will be dependent on the choice of regularizer, hence “meaningless” in the same sense of the correspondence discussed above, and indeed notice that the determination of the mean also defines its correspondence to each of the given shapes Si . Therefore, we adopt a new definition of average shape relative to the feature Rσ as follows. Since we are looking for a shape M represented by a binary image, in order to guarantee its structure we write it explicitly as a characteristic binary function M = H(φ), where H(φ) :  → {0, 1} is defined using the 1-dimensional Heaviside function  1, if z ≥ 0, H(z) = 0, if z < 0, and φ :  → R is an implicit representation of M: φ ≥ 0 inside the average shape or on its boundary, and φ < 0 outside the average shape. The average shape is then defined by the function φ. In order to find the shape average, therefore, we minimize the energy  n   E= |Rσ (x|H(φ)) − Rσ (hi (x)|Si )|2 dx + α Ereg (hi ) , (9) i=1



with respect to φ and h1 , . . . , hn , where Rσ (x|H(φ)) = H(φ)(x)(Kσ ∗ (1 − H(φ))(x).

We do so via alternating minimization and gradient descent. The associated Euler-Lagrange equations are solved iteratively by first fixing hi ’s and

3 Note

that we do not yet use the notion of mean shape since we have not yet defined a distribution of measures on the set of shapes. 4 This point was also raised by Charpiat et al. in [6], who bypassed it by avoiding a notion of ˜ i , S j ) with respect average shape and looking instead at the matrix of pairwise distances dij = d(S ˜ to a chordal distance d.

Enforcing local context into shape statistics

performing one step in the opposite direction of the gradient of E with respect to φ:  n    ∂φ Rσ (H(φ); x) − Rσ (Si ; hi (x)) = δ(φ) (Kσ ∗ (1 − H(φ))) · ∂t i=1  n    − Kσ ∗ H(φ) · Rσ (H(φ); x) − Rσ (Si ; hi (x)) , i=1

where t ≥ 0 is the iteration index. Then, fixing φ, we perform an iteration in the opposite direction of the gradient of E with respect to hi = (hi,1 , hi,2 ):

∂hi = (Rσ (H(φ); x) − Rσ (Si ; hi (x))) · − (∇ Si ◦ hi ) · (Kσ ∗ (1 − Si ◦ hi )) ∂t  + (Si ◦ hi ) · (Kσ ∗ ∇ Si ) ◦ hi + μ hi + (λ + μ)∇(div hi ) . We could initialize the iteration with a generic shape average (e.g. a circle) or with the average of the signed distance functions of Si ’s as in [24]. In practice, since we deal with non-convex minimizations, we prefer the second approach: from the average of the signed distance functions, we obtain an initial level  set function and a characteristic function, given H( n1 i SDFi ). In all our calculations, the transformations h or hi are initialized by the identity map. In Fig. 8 we show the difference between the mean shapes obtained using our proposed feature-based method and the method based on the binary representation only on simple, but demonstrative shapes with bumps at different locations. Notice how the scheme without feature produces smoothed mean shape with smoothed corners, whereas the results obtained with our proposed method preserve perfectly the shape, taking into account the specific

Fig. 8 Comparison of average shapes obtained from two rectangular shapes with bumps at different locations using the proposed scheme without feature based on the binary representation only, and our feature-based method using the first shape feature (3). The first column shows the two given shapes. The second column shows the deformation fields x − h(x) that represent the warping of the shapes by the scheme without feature. The third column shows warped shapes from the original shapes colored in red to the mean by the scheme without feature. Columns 4 and 5 show the results as in columns 2 and 3, but obtained by our proposed method with feature. The last column gives the mean shape by the method without feature (top) and the mean shape by our method with feature (bottom). (The parameters σ = 7, α = 0.005, λ = 0.5, μ = 1 are used and image size is 100 × 100)

B.-W. Hong et al.

shape features. The preservation of feature at the mean shape indicates the meaningful correspondences are established between the mean shape and each shape in the ensemble. The average shape obtained by conventional warping is affected both by the choice of regularizer and by the data fidelity term, and has therefore smooth corners; our average shape is not affected by the choice of a simple regularizer; instead, preserves significant geometric features that are present in the original shapes. This is due to the meaningful correspondence enforced by matching context features, instead of binary templates.

3 Shape variations In order to analyze the variability of an ensemble {S1 , S2 , · · · , Sn } relative to the average shape M we compute second-order statistics by finding the principal modes of the sample covariance. Because the ensemble is obtained from the average via a warping, Si (hi )  M, and warpings are vector fields that can be composed linearly, this can be performed by simple linear statistical analysis, that is principal component analysis (PCA). This linear compositionality is one of the key advantages of working within the framework of deformable templates. Other shape representations, for instance signed-distance functions, are not closed under linear operations. In order to perform PCA we first compute the average deformation field at every point x ∈ : 1 hi . n i=1 n

ν=

Then, the sample covariance of the deformation fields is constructed by (writing the transformations ν and hi as long vectors): 1 (hi − ν)(hi − ν)T . n i=1 n

=

The principal modes of variation are the eigenvectors vi of the covariance , and a deformation h can be approximated using the first k eigenvectors vi , i = 1, . . . , k corresponding to the largest eigenvalues λi by: h=ν+

k 

ξi vi

i=1

where ξi ∈ R are weights for the different modes. An instance of a shape S within the ensemble can be approximated by projection onto the principal modes along with the representative average shape, that is by finding the coefficients ξi such that   k  SM ν+ ξi vi i=1

Enforcing local context into shape statistics

in the L2 norm. In the experimental section we will show the average shape and principal modes of variation of a number of real ensembles of shapes. The shape average and principal modes of variation, together with the linear structure of the deformation fields, can be used to define a Gaussian distribution in an ensemble of shapes: {Si }i=1,...,n ∼ N (ν, ), with ν and defined above. This can be used as a prior for segmentation as illustrated in Section 4. Note that the definition of a shape average and covariance via linear statistical analysis of diffeomorphic domain deformations is not new, and in fact is one of the staples of deformable templates. What is different here is that such deformations are computed not based on the binary template, but based on the feature Rσ , which includes a notion of scale and induces a correspondence based on local context.

4 Application: segmentation with shape prior Analysis of medical images often requires the detection of anatomical structures. The process of “segmenting” these structures from the images using only low-level information (e.g. intensity values) is often unsatisfactory because of low contrast, noise and the intrinsic variability of the target shape, so top-down information, in the form of prior knowledge on the anatomical structure of interest learned from examples, can greatly benefit the analysis [14, 43, 45]. Incorporating prior knowledge in segmentation is an important area of work in medical imaging with many significant contributions. In this section we explore the use of our representation and shape statistics in segmentation using the level set framework [29] as commonly done in [3, 4, 30, 39, 42]. We evolve the boundary of the shape via an embedding function φ, which is a convenient mean to compute our feature, and also allows easy computation of a measure of dissimilarity to the target shape [35]. The basic underlying segmentation model we use is [4]. We propose two models with prior. In the first one, our prior is essentially a Gaussian density learned using PCA from a set of training shapes, incorporated into level-set segmentation, following the line of work pioneered by [10, 24, 36, 42]. Our goal here is to illustrate the role of the shape context, hence we keep the statistical model as simple as possible. Finite-dimensional models for registration have also been used in [5, 34]. For comparison, in the second model, we only use one shape as prior. 4.1 Model 1 To illustrate our approach, assume that we have a training set of shapes S1 , ..., Sn , and compute their average shape M as in Section 2. As a byproduct, we get the deformations h1 , ..., hn that warp each Si into the average shape M. Following the derivation in Section 3 we construct a deformable template k M(ν + i=1 ξi vi ), allowing the scalar weights ξi to vary, as illustrated in Figs. 11

B.-W. Hong et al.

and 14. Now we wish to detect a shape defined by the unknown binary function k H(φ) in a given image I, such that H(φ) is also close to M(ν + i=1 ξi vi ). We consider the energy function inf E(φ, ξ ) = Eseg (φ) + β Eprior (φ, ξ ) φ,ξ

(10)

(with ξ = (ξ1 , ..., ξk )), where for the first term we use [4] as given by:    |I(x) − c− |2 H(−φ)dx, Eseg (φ) = γ |∇ H(φ)|dx + |I(x) − c+ |2 H(φ)dx + 





where γ is a constant and

    k 2    Eprior (φ, ξ ) = ξi vi  dx,  H(φ) − M ν + 

i=1

√ √ where each ξi , i = 1, ..., k, is optimized within the range [−η λi , η λi ] for a constant η as in [15] where they used η = 3. We minimize E(φ, ξ ) by an alternating procedure using the Euler-Lagrange equations and time-dependent regularization for an artificial time t > 0 given by gradient descent: time-varying constants c+ and c− are defined function of I and φ by   I(x)H(−φ(x, t))dx + − I(x)H(φ(x, t))dx c (t) = , c (t) =  , H(φ(x, t))dx   H(−φ(x, t))dx       k  ∂φ ∇φ +2 − 2 ξi vi = δ(φ) γ div −|I −c | +|I − c | − 2β H(φ) − M ν + , ∂t |∇φ| i=1 and for i = 1, ..., k ∂ξi = 2β ∂t

  

 H(φ) − M ν +

k 

 ξi vi

(∇ M · vi )dx,

i=1

k where ∇ M is evaluated at ν + i=1 ξi vi . The initial shape in the alternating minimization is given by the mean shape. We would like to mention that our minimization problem with shape prior is a non-convex optimization problem, with several local and global minimizers. We do not guarantee that we compute a global minimizer; moreover, such global minimizer may not be unique in theory. Thus, the segmentation results may depend on the initial guess. However, it is known that the region-based global segmentation criteria [4] based on the piecewise-constant Mumford-Shah model [28] is less dependent on the initial guess, by comparison to edge-based models (for such models, the location of the initial curve would be very important). In the future, we propose to use in practice an H 1 Sobolev gradient, instead of the standard L2 gradient; it has been shown in [38] that with such modification, the results are much less dependent on local variations.

Enforcing local context into shape statistics

4.2 Model 2 This is the model proposed in our preliminary work [20], that we recall here for comparison, where only one shape of the ensemble is used for the prior information, instead of a variable shape close to the mean shape of the ensemble. In the previous approach called “Model 1”, diffeomorphisms hi are computed in a preprocessing step; then these are used in the computation of the average shape, of the deformation modes vi , and of a deformation h as a linear combination of the average deformation and the modes. Then the segmentation with shape prior was done with respect to the level set function φ and the weights in the linear combination. When we use only one shape prior S (given by the binary representation), we keep the diffeomorphism h as unknown, which is constraint by the shape term with our shape feature and the regularization given by the linear elasticity. The shape energy Eshape incorporated into matching energy penalizes local deviations of the target shape from the template. However, unless properly restricted, a general diffeomorphism does not preserve the local shape of the template, and leads instead to a perfect matching to the target regardless of its shape (provided that it is in the same diffeomorphic equivalence class). Our goal, on the other hand, is to preserve correspondence of local shape during the entire evolution of the diffeomorphism from the template to the target shape. In the shape energy, restriction of the shape space under the diffeomorphism is usually addressed by generic regularizers, such as linear elasticity, that do not preserve local shape. Therefore, an additional regularization term is needed, designed to preserve local shape during global deformation. The segmentation with one shape prior [20] can be achieved by minimizing E(φ, h) = Eseg (φ) + β Eshape (h|H(φ), S) + α Ereg (h) + τ Ediff (h|S),

(11)

where S is the given prior shape (binary representation), and with the first three terms being previously defined: Eshape is defined in (8), Ereg defined in (7), and Eseg defined in the previous subsection. The last term Ediff (h|S) is given by   2   Ediff (h|S) = Rσ (x|S ◦ h) − Rσ (h(x)|S) dx. 

This energy penalizes large deformations of the template in terms of the local shape feature. Its gradient is given by, ∂ Ediff = G · ∇ S ◦ h · {Kσ ∗ (1 − S ◦ h)} − ∇ S ◦ h · {Kσ ∗ (G · S ◦ h)} ∂h − G · ∇ S ◦ h · {Kσ ∗ (S ◦ h)} + ∇ S ◦ h · {Kσ ∗ (G · (1 − S ◦ h)}

 − G · ∇ S · (Kσ ∗ (1 − 2S)) + (1 − 2S) · (Kσ ∗ ∇ S) (h) where G = Rσ (x|S ◦ h) − Rσ (h(x)|S) (here, the shape feature (4) is used).

B.-W. Hong et al.

Now all necessary ingredients in the coupled Euler-Lagrange equations in (φ, h) being known, we solve in practice ∂ Eseg ∂ Eshape ∂φ =− −β , ∂t ∂φ ∂φ ∂ Ereg ∂ Eshape ∂h ∂ Ediff = −α −β −τ . ∂t ∂h ∂h ∂h This method uses as initialization the shape prior given by the binary template S.

5 Experiments on real data In addition to the results shown in the previous sections, we present in this section other more challenging experimental results to illustrate the performance of our model in the computation of shape statistics, and in their exploitation to perform segmentation with shape prior applied to real data. We apply the proposed method to compute statistics of an ensemble of real data of corpi callosi from 10 different individuals shown in the first row of Fig. 9. We first compute the average shape shown at the middle column in Fig. 11, together with the warped shapes in the second row in Fig. 9 and the deformation fields hi between each shape and the average are presented in the third row of Fig. 9. We apply PCA to the obtained deformation fields and the first three principal modes are shown in Fig. 10. In Fig. 11 we illustrate the shape variability captured by the second-order statistics relative to the average (middle column) √ via the√variations (first and last columns) based on varying weights ξi ∈ {−18 λi , 18 λi } of the first three principal modes vi , i = 1, 2, 3. We repeat these experiments on 12 different hand shapes shown in Fig. 12. The average shape is computed together with the domain deformations. The three principal modes of variation v1 , v2 , v3 are shown in Fig. 13. In Fig. 14, we illustrate the variability √ √ from the average shape by changing the weights ξi for vi , with ξi ∈ {−6 λi , 6 λi }.

Fig. 9 Average shape results obtained from 10 different corpus callosum shapes using our method. (First row) Original shapes. (Second row) Warped shapes to the average shape. (Third row) Deformation fields hi that represent the warping of the shapes to the average shape. (The parameters σ = 7, α = 0.01, λ = 0.5, μ = 1 are used and image size is 120 × 200)

Enforcing local context into shape statistics Fig. 10 Three principal modes of variation obtained from the warping in Fig. 9

υ1

Fig. 11 Shape statistics from the shapes shown in Fig. 9. Three principal modes vi , i = 1, 2, 3 shown in Fig. 10 are used with√weights √ ξi ∈ {−18 λi , 0, 18 λi }. ξi = 0 corresponds to the average

υ2

–18√λ i

υ3

0

18 √λ i

υ1

υ2

υ3

Fig. 12 A set of 12 different hand shapes

Fig. 13 Three principal modes of variation obtained from the warping in Fig. 12

υ1

υ2

υ3

B.-W. Hong et al. Fig. 14 Shape statistics from the shapes shown in Fig. 12. Three principal modes vi , i = 1, 2, 3 shown in Fig. 13 are used with √weights√ ξi ∈ {−6 λi , 0, 6 λi }. ξi = 0 corresponds to the average

–6 √ λi

0

6 √ λi

υ1

υ2

υ3

Finally, in Fig. 15 we illustrate the use of our prior for segmentation using the average shape and the three principal modes as a shape prior (model (10)). The corpus callosum in the original data does not exhibit enough contrast to allow successful segmentation using only low-level (intensity) data (the Chan-Vese segmentation method without prior has been used; different results would be obtained using different length parameters γ ). However, using the proposed shape prior (10) allows successful segmentation and adaptation to the target shape, even in cases with much noise. Also, the model is less dependent on the choice of the parameters. For a first comparison, we also show in Fig. 16 a segmentation result on a hand image with occlusion and on a very similar brain image using the previously proposed model (11). An advantage of Model 1 is in the fact that, once the mean shape has been computed together with the hi ’s, model (10) is simpler than model (11), since the diffeomorphism is restricted to a subspace of finite dimension. In Fig. 16, we also show segmentation with shape prior given by the binary representation only, in the fourth column, which is less satisfactory especially for the brain image; this shows advantage of using our proposed shape feature with integral kernel, over the binary representation. Finally, we show additional segmentation experiments and comparisons on simple, but illustrative examples with noise and occlusion: Figs. 17 (very noisy data) and 18 (occluded object). The rectangular shapes with bumps from Fig. 8 and their statistics mean shape were used in Model 1 (with our shape descriptor), compared with Model 1 with prior without our shape descriptor (based on the binary representation only), and without using shape prior. The

Enforcing local context into shape statistics

Fig. 15 Model 1: robustness with respect to noise. Segmentation results on a brain MRI image (top row) and its noisy versions (middle and bottom rows). (Left column) Source images. (Middle column) Segmentations without shape prior (the Chan-Vese segmentation method without prior has been used). (Right column) Segmentations with shape prior (10). The average shape shown in Fig. 11 and the three principal modes shown in Fig. 10 are used for the shape prior model. (The parameters γ = 105 , β = 4500 (top), β = 5500 (middle) and β = 5000 (bottom) are used and image size is 200 × 200)

Fig. 16 Model 2 and comparisons. Segmentation results of a partially occluded hand [Top row] and corpus callosum [Bottom row]. From left to right: image with an object to be segmented; shape prior superimposed on the image; segmentation Eseg without using any shape prior; segmentation with a shape prior introducing additional Eshape energy computed from the binary representation only; segmentation using both Eshape and Ediff using the proposed shape feature and model (11). Parameters: (hand) image size 152 × 184, γ = 400, α = 0.1, β = 4000, σ = 5, τ = 2000; (brain) image size 306 × 250, γ = 20, α = 0.1, β = 610, σ = 7, τ = 12885

B.-W. Hong et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 17 Comparison of the segmentation results on a synthetic image with noise. a Original image. b Original image with gaussian noise. c Initialization for the segmentation. d Segmentation result without shape prior using Eseg (φ). e Segmentation result with a shape prior that is obtained without using shape descriptor, E(φ, ξ ) and binary representation. f Segmentation result with a shape prior that is obtained using our shape descriptor, E(φ, ξ ) and Rσ . The common parameters γ = 105 , β = 106 are used for segmentation, σ = 35 for the shape descriptor, and image size is 500 × 500

objective of these additional experiments is to demonstrate the effectiveness of our shape descriptor as a shape prior in segmentation. The segmentation results with a shape prior that is obtained without using shape descriptor (binary image) did not preserve geometrically characteristic features with high curvatures such as corners. Whereas the segmentation results with a shape prior that is obtained using our shape descriptor effectively preserved those characteristic features (the segmentation without any shape prior may require many more iterations for convergence to some stable shape). In our experiments, we have chosen the parameters according to the characteristics of shapes and images. We may want to rely more on data or shape depending on the given image and the prior shapes. We know that if γ is too small, the data and shape terms are stronger; therefore, too large γ will make the data terms and shape terms less important; similarly, if β is too small, the data term will be stronger; therefore, larger β will produce less influence from the data. There is no universal set of parameters that works perfectly for all cases. However, the results in Fig. 15 show that, thanks to the shape prior term, very similar parameters can be used for images from the same category

Enforcing local context into shape statistics

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 18 Comparison of the segmentation results on a synthetic image with occlusion. a Original image. b Original image with occlusion. c Initialization for the segmentation. d Segmentation result without shape prior, using Eseg (φ). e Segmentation result with a shape prior that is obtained without using shape descriptor, E(φ, ξ ) and binary representation. f Segmentation result with a shape prior that is obtained using our shape descriptor, E(φ, ξ ) and Rσ . The common parameters γ = 105 , β = 104 are used for segmentation, σ = 35 for the shape descriptor, and image size is 500 × 500

but with different levels of noise (this would be ideal in practical applications, where we do not want to choose parameters for every independent test). For this reason the length coefficient γ has been kept the same, and the values of β are very similar. Each individual result can be improved by making a better search of parameters, but those parameters would depend more on the specific image; for instance, γ can be taken smaller when there is no noise. We hope that with further normalizations of the specific terms and parameters function of the image size, noise level, the parameters can be kept very similar for different tests.

6 Conclusions and discussion We have exploited a local feature that captures shape context to establish meaningful correspondence between shapes in an ensemble, and used that to: find matching and meaningful correspondences between shapes; compute shape statistics of the first (average) and second order (variance);

B.-W. Hong et al.

successfully segment images with shape prior. These are computed using infinite-dimensional deformation fields and an integral kernel that is robust to noise and deviation from the ideal deformable template model. Also, we have computed the statistics by performing principal component analysis on the set of deformation fields from each shape to the average. Satisfactory experimental results on real and synthetic data have been shown to validate the proposed method when compared with analysis of the binary templates and in the presence of noise. We have also introduced a novel segmentation algorithm using shape prior that is modeled by the proposed shape statistics via meaningful correspondences and shown promising segmentation results. This model can use an ensemble of shapes as prior, and seams to be simpler than the one based on one shape prior proposed by the authors in prior work. Acknowledgements The authors would like to thank the editors Lixin Shen and Yuesheng Xu for their invitation to contribute to the special issue on mathematical methods for image processing. The authors would also like to thank the anonymous referees for all their suggestions and comments that helped to improve the presentation of the manuscript. Byung-Woo Hong’s research was supported by the Ministry of Knowledge Economy, Korea, under the HNRC (Home Network Research Center) at ITRC (Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment). Stefano Soatto and Luminita Vese’s research was supported by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 RR021813 entitled Center for Computational Biology (CCB). Finally, Luminita Vese acknowledges additional funding from the NSF-DMS grants 0714945 and 0312222.

Appendix To minimize the energy (9), we use alternating minimization and gradient descent. Thus, we compute and solve the system formed by the associated Euler-Lagrange equations, parameterized by an artificial time t. WE give here the details of the derivation. If we fix h1 , ..., hn and we minimize with respect to φ, we let F( ) =

n 

˜ Si ), Eshape (hi ; H(φ + φ),

i=1 n    ˜ x) − Rσ (Si (hi (x))) F ( ) = Rσ (H(φ + φ); i=1



 ˜ φ˜ Kσ ∗ (1 − H(φ + φ)) ˜ × δ(φ + φ) ˜ φ) ˜ ˜ Kσ ∗ (−δ(φ + φ) +H(φ + φ) ˜ φ˜ Kσ ∗ H(φ + φ) ˜ −δ(φ + φ) ˜ ˜ φ) ˜ +(1 − H(φ + φ)) Kσ ∗ (δ(φ + φ) dx.

Enforcing local context into shape statistics

Then F  (0) =

n   i=1



(Rσ (H(φ); x) − Rσ (Si (hi (x))))

 ˜ × δ(φ) φ˜ Kσ ∗ (1 − H(φ) + H(φ) Kσ ∗ (−δ(φ) φ) ˜ dx, −δ(φ) φ˜ Kσ ∗ H(φ) + (1 − H(φ)) Kσ ∗ (δ(φ) φ) or F  (0) =

n   i=1



(Rσ (H(φ); x) − Rσ (Si (hi (x)))) δ(φ) φ˜ Kσ ∗ (1 − 2H(φ)  + δ(φ)φ˜ Kσ ∗ (1 − 2H(φ)) (Rσ (H(φ); x)  −Rσ (Si (hi (x)))) dx

Thus, the Euler-Lagrange equation in φ, parameterized by t ≥ 0, is  n  ∂φ = δ(φ) Kσ ∗ (1 − 2H(φ)) · (Rσ (H(φ); x) − Rσ (Si (hi (x)))) ∂t i=1

+ Kσ ∗ (1 − 2H(φ)) ·

n 

(Rσ (H(φ); x) − Rσ (Si (hi (x))))



.

i=1

Now varying only hi = (hi,1 , hi,2 ), we obtain in a similar way the associated Euler-Lagrange equation parameterized by t ≥ 0

∂hi = −∇ Si ◦ hi (Rσ (H(φ); x) − Rσ (Si ◦ hi ; x)) · (Kσ ∗ (2Si ◦ hi − 1)) ∂t  + Kσ ∗ (Rσ (H(φ); x) − Rσ (Si ◦ hi ; x)) · (2Si ◦ hi − 1) + μ hi + (λ + μ)∇(div hi ) .

References 1. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE TPAMI 24(24), 509–522 (2002) 2. Bresson, X., Vandergheynst, P., Thiran, J.: A variational model for object segmentation using boundary information and shape prior driven by the Mumford-Shah functional. IJCV 28(2), 145–162 (2006) 3. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. In: Proceedings ICCV’95, pp. 694–699, Cambridge, June 1995 4. Chan, T., Vese, L.: Active contours without edges. IEEE TIP 10(2), 266–277 (2001) 5. Chan, T., Zhu, W.: Level set based shape prior segmentation. In: Proceedings CVPR’05, pp. 20–26, San Diego, June 2005

B.-W. Hong et al. 6. Charpiat, G., Faugeras, O., Keriven, R.: Image statistics based on diffeomorphic matching. In: Proceedings ICCV’05, Beijing, October 2005 7. Christensen, G., Rabbitt, R., Miller, M.: Deformable template using large deformation kinematics. IEEE TIP 5(10), 1437–1447 (1996) 8. Ciarlet, P.G.: Mathematical Elasticity. Vol. I, Series “ Studies in Mathematics and its Applications”. North-Holland, Amsterdam (1988) 9. Cootes, T., Taylor, C., Cooper, D., Graham, J.: Active shape models—their training and application. CVIU 61(1), 38–59 (1995) 10. Cremers, D., Kohlberger, T., Schnörr, C.: Shape statistics in kernel space for variational image segmentation. Pattern Recogn. 36(9), 1929–1943 (2003) 11. Dambreville, S., Rathi, Y., Tannenbaum, A.: Shape-based approach to robust image segmentation using kernel PCA. In: Proceedings CVPR’06, pp. 977–984, New York, 17–22 June 2006 12. Davatzikos, C., Tao, X., Shen, D.: Hierarchical active shape models, using the wavelet transform. IEEE TMI 22(3), 414–423 (2003) 13. Dryden, I., Mardia, K.: Statistical Shape Analysis. Wiley, New York (1998) 14. Felzenszwalb, P.: Representation and detection of deformable shapes. IEEE TPAMI 27(2), 208–220 (2005) 15. Fletcher, T., Lu, C., Pizer, S.M., Joshi, S.: Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE TMI 23(8), 995–1005 (2004) 16. Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the Poisson equation. In: Proceedings CVPR’04, pp. 61–67, Washington, DC, June 2004 17. Grenander, U.: General Pattern Theory. Oxford University Press, Oxford (1993) 18. Grenander, U., Miller, M.: Computational anatomy: an emerging discipline. Quart. Appl. Math. 56(4), 617–694 (1998) 19. Guo, H., Rangarajan, A., Joshi, S., Younes, L.: Non-rigid registration of shapes via diffeomorphic point matching. In: Proceedings ISBI’04, Arlington, 15–18 April 2004 20. Hong, B.-W., Prados, E., Soatto, S., Vese, L.: Shape representation based on integral kernels: application to image matching and segmentation. In: Proceedings CVPR’06(1), pp. 833–840, New York, 17–22 June 2006 21. Kim, J., Çetin, M., Willsky, A.: Nonparametric shape priors for active contour-based image segmentation. Signal Process. 87(12), 3021–3044 (2007) 22. Kohlberger, T., Cremers, D., Rousson, M., Ramaraj, R.: 4D shape priors for level set segmentation of the left myocardium in SPECT sequences. Lecture Notes in Comput. Sci. 4190, 92–100 (2006) 23. Lecellier, F., Jehan-Besson, S., Fadili, J., Aubert, G., Revenu, M., Saloux, E.: Region-based active contour with noise and shape priors. In: Proceedings ICIP’06, pp. 1649–1652, Atlanta, 8–11 October 2006 24. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape influence in geodesic active contours. In: Proceedings CVPR’00, pp. 316–323, Hilton Head Island, June 2000 25. Malladi, R., Sethian, J.A., Vemuri, B.C.: Shape modeling with front propagation: a level set approach. IEEE TPAMI 17(2), 158–175 (1995) 26. Manay, S., Cremers, D., Hong, B.-W., Yezzi, A., Soatto, S.: Shape matching via integral invariants. IEEE TPAMI 28(10), 1602–1617 (2006) 27. Miller, M., Younes, L.: Group actions, homeomorphisms, and matching: a general framework. IJCV 41(1–2), 61–84 (2001) 28. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–684 (1989) 29. Osher, S., Sethian, J.: Fronts propagating with curvature dependent speed: algorithms based on the Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 30. Paragios, N., Deriche, R.: Geodesic active regions and level set methods for supervised texture segmentation. IJCV 46(3), 223–247 (2002) 31. Paragios, N., Rousson, M., Ramesh, V.: Matching distance functions: a shape-to-area variational approach for global-to-local registration. In: Proceedings ECCV’02(2). LNCS, vol. 2351, pp. 775–789, Copenhagen, 27 May–2 June 2002 32. Paragios, N., Rousson, M., Ramesh, V.: Non-rigid registration using distance functions. Comput. Vis. Image Underst. 89, 142–165 (2003)

Enforcing local context into shape statistics 33. Pitiot, A., Delingette, H., Toga, A.W., Thompson, P.: Learning object correspondences with the observed transport shape measure. In: Proceedings IPMI 2003. LNCS, vol. 2732, pp. 25–37, Ambleside, July 2003 34. Riklin-Raviv, T., Kiryati, N., Sochen, N.: Unlevel-sets: geometry and prior-based segmentation. In: ECCV, 2004, LNCS, vol. 3024, pp. 50–61, Prague, 11–14 May 2004 35. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Proceedings ECCV’02(2). LNCS, vol. 2351, pp. 78–92, Copenhagen, 27 May–2 June 2002 36. Rousson, M., Paragios, N., Deriche, R.: Active shape models from a level set perspective. TR 4984 INRIA (2003) 37. Sebastian, T.B., Klein, P.N., Kimia, B.B.: On aligning curves. IEEE TPAMI 25(1), 116–124 (2003) 38. Sundaramoorthi, G., Yezzi, A., Mennucci, A.C.: Sobolev active contours. IJCV 73(3), 345–366 (2007) 39. Siddiqi, K., Berube, Y., Tannenbaum, A., Zucker, S.: Area and length minimizing flows for shape segmentation. IEEE TIP 7(3), 433–443 (1998) 40. Soatto, S., Yezzi, A.: DEFORMOTION, deforming motion, shape average and the joint registration and segmentation of images. In: Proceedings ECCV’02(3). LNCS, vol. 2352, pp. 32–47, Copenhagen, 27 May–2 June 2002 41. Taron, M., Paragios, N., Jolly, M.-P.: From uncertainties to statistical model building and segmentation of the left ventricle. In: Proceedings ICCV’07, Rio de Janeiro, 14–20 October 2007 42. Tsai, A., Yezzi, A., Wells, W., Tempany, C., Tucker, D., Fan, A., Grimson, W.E., Willsky, A.: Model-based curve evolution technique for image segmentation. In: Proceedings CVPR’01(I), pp. 463–467 (2001) 43. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image registration via level-set motion: applications to atlas-based segmentation. MIA 7, 1–20 (2003) 44. Yan, P., Kassim, A.: Medical image segmentation using minimal path deformable models with implicit shape priors. IEEE TITM 10(4), 677–684 (2006) 45. Yu, T., Luo, J., Singhal, A., Ahuja, N.: Shape regularized active contour based on dynamic programming for anatomical structure segmentation. In: Proceedings SPIE Medical Imaging: Image Processing, pp. 419–430, San Diego, 13–15 February 2005