Salient Contour Detection using a Global Contour ... - Semantic Scholar

Report 0 Downloads 90 Views
Salient Contour Detection using a Global Contour Discontinuity Measurement Hongzhi Wang Stevens Institute of Technology

John Oliensis Stevens Institute of Technology

[email protected]

[email protected]

Abstract Salient contour grouping/detection is one of the major topics in perceptual organization exploring the true meanings for contours. Extending contour’s discontinuity quality from low level discontinuity, we introduce a new perspective that a salient contour represents a sharp change in the ability to organize the image into meaningful parts. This new view gives a novel, general edge energy measure, which has important perceptual meanings. A new multiscale saliency function and its variational solution are also described. Experiments on real images validate our method.

1. Introduction In this paper, we present a new perceptually salient closed contour detection method. The goal is simple. Given an image, we want to find a closed contour that separates the most salient figure from the background. Contour grouping is an old problem in perceptual organization, which explores what truly defines natural boundaries. Two main criteria—natural boundaries have smooth shapes; boundaries organize images into distinct meaningful parts—have been applied to define contours, resulting in two main categories of contour grouping approaches: edge based and region based. Edge based methods, e.g. [1,5,6,19,22], use edge detectors, e.g. [2,14], to find edge segments and then apply Gestalt principles such as smoothness to group edge segments into longer more salient contours. However, edge based methods neglect region information and suffer from the unreliability of detecting local edge fragments, especially in highly textured regions. Unlike edge based methods, region based methods, e.g. [10,13,15,20,25], consider the image statistics within the regions partitioned by contours. Salient contours should partition images into regions that are sufficiently uniform and distinct, where the goodness of the partition is measured by a criterion such as the Minimal Description Length (MDL) principle. Shortcomings of these methods include no requirement of boundary smoothness and the inaccuracy of locating boundaries. For example, Zhu and Yuille [24] report that pixels located

on the borders of the distributions of adjacent regions are likely to be misclassified, resulting in inaccurate contours. This is because region based methods emphasize global image statistics and ignore local contour image statistics, so that local information is overruled. Some proposed methods combine edge based and region based approaches to exploit the advantages of both, attempting to respect both local image properties and global region information. For example, Jermyn and Ishikawa [9] use Green’s theorem to transfer region energy into boundary energy so that local computation at the boundaries can also include region information. Though allowing an efficient global optimal solution, their method requires a properly chosen region energy to work well. More important, their method can only account for simple statistical relations between figures and backgrounds, e.g. that figures are brighter or darker than backgrounds. This is because Green’s theorem requires the region energy to have continuous first derivatives, which does not hold for images in general. Images may be discontinuous even over the region of a single object. As a result, local measurements at boundaries may not give accurate or complete information about regions. To accurately evaluate a contour’s energy with respect to regions, the regions’ information should be used explicitly. To this end, Paragios and Deriche model images as mixtures of Gaussian homogeneous regions. Edge energy is evaluated based on a global learned Gaussian mixture model [16,17]. However, their model based approach has limitations. Firstly, they use a strong assumption that distinct regions are homogeneous, which oversegments nonhomogeneous figures. Secondly, though this assumption facilitates the model learning procedure, Gaussian-mixture model acquisition is still not a trivial task. We propose a new contour energy measure. Our key contribution is a new criterion for natural image boundaries: a good boundary represents a sharp change in the ability to organize the image into distinct meaningful parts. This new perspective makes our contour energy measure quite general. As we will show, the measure respects both global and local information and can exploit any kind of image statistics. We also propose a multiscale contour saliency function

and give a variational method to find optimal contours using this saliency function.

saliency function such that big saliency values favor perceptually salient contours.

1.1. Related work

2.1. Contour as an image organizer

There is a huge amount of work on perceptually salient contour detection and grouping. We briefly discuss some representative works. Sha’ashua and Ullman’s saliency network [19] is one of the earliest edge based contour grouping methods. They define saliency based on curve properties such as smoothness and curve length. Recent generalizations include [1]. Elder and Zucker enforce the closure constraint in contour grouping [6], and Wang et al’s ratio contour [22] is also along these lines. [5] describes contour grouping exploiting knowledge of the objects of interest. In Williams and Jacobs’ stochastic completion field theory [23], contours are generated from random walks of particles. The most salient contour corresponds to the maximum likelihood particle path. Mahamud et al. [12] incorporate the contour closure constraint into the stochastic completion approach. Contour grouping is closely related to image segmentation, which focuses on grouping pixels with similar image properties. Shi and Malik’s normalized cut approach [21] is a representative work. The normalized cut maximizes the similarity within groups and the dissimilarity between different groups. The global normalized criterion can be optimized efficiently. A drawback of normalized cut is that it lacks controls on contour geometry. Later extensions [13] include edges into the similarity measures, i.e. the similarity between two pixels should be small if there is an edge between them. Since the edge segments are detected locally, their unreliability can cause oversegmentation. Though these methods perform better at the boundaries between figures and backgrounds, they perform worse at the boundaries inside figures or backgrounds. [15] is another representative work, in which Mumford and Shah assume that an image can be approximated by a piecewise smooth function. Another related area is active contours (snakes) and level set methods. We use active contour techniques to optimize the saliency function. A traditional snake has two energy components: edge energy and contour smoothness energy [10]. It is guaranteed to converge to a local minimum. Region based snakes have also been proposed, e.g. Chan and Vese [3] use a single intensity value to represent each segment. The Paragios and Deriche approach discussed earlier is another representative work [16,17] which uses level-sets.

Salient contours can be viewed as image organizers because they segment images into distinct meaningful parts. With salient contours, images are represented in a high level form. Organization is a fundamental quality of contours and has been explored thoroughly [4,11,15,24]. We consider that a salient contour segments an image into two parts: figure, the region inside the contour, and background, the region outside and on the contour (for computational reasons, see equation (4), we define a contour to belong to the background). To measure the high level image organization, different functions can be used to handle different interests. In a general use, we consider that contours segment images into distinct regions with common statistical properties. The MDL principle is a good representation. We use the entropy encoding length to represent the organization function O(C) of contour C: Y µ hf g (i) ¶hf g (i) µ hbg (i) ¶hbg (i) O(C) = −log Nf g Nbg i=0:255 X = − [log(hf g (i))hf g (i) + log(hbg (i))

2. Salient Contour Measurements In this section, we analyze the relevant qualities of contours and propose measures for them. Then we define our

i=0:255

hbg (i)] + Nf g logNf g + Nbg logNbg

(1)

where hf g and hbg are smoothed image–property histograms of the figure region and the background region respectively. (Let Hf g be the original image–property histogram of the figure region. Then hf g = Hf g ∗ G, where G is a Gaussian function.) Nf g and Nbg are the number of pixels in the two regions respectively. For simplicity, we use the image property intensity, but other image properties, such as texture, could be used too. Note that encoding length is not the only choice to represent the contour’s organization function. If we consider figure and background as two different classes, salient contours can be considered as classifiers. To maximize the distance between the two classes, a discriminative function could be used.

2.2. Contour as an indicator of discontinuity One major property of boundaries is that boundaries indicate discontinuity between the image parts sharing the same boundary. This property has been successfully applied from a local viewpoint, e.g. for contour detection based on discontinuities in the intensity derivatives [2], but when we consider a contour as indicating discontinuity in a high level image representation, e.g. the discontinuity between a figure and a background, the local viewpoint is inadequate. Our major contribution is to introduce a new global discontinuity criterion for boundaries: a salient contour should

represent a discontinuity in the high level image representation, that is, a sharp change in the ability to organize the image into distinct meaningful parts. Note that our high level discontinuity criterion differs from Paragios and Deriche’s method in which the high level (Gaussian mixture) image representation is used just for boundary energy evaluation [16,17]. As we show later, our new approach has immediate advantages: it gives a general contour energy measure incorporating global and local criteria well, without the difficult task of model estimation and without the problem of oversegmenting salient non–homogeneous regions. We define a new global contour discontinuity measure as the derivative of the high level image representation function (1) with respect to the whole contour: D(C) =

dO (C) dC

(2)

D(C) measures the change in O(C) with respect to a move of contour C. To explicitly define D(C), we need to first specify the contour’s movements. We define that, in a positive movement, each contour point moves one pixel along the outward norm direction and in a negative movement, contour points move inward one pixel. As illustrated in Fig. 1a, contours C+ and C− are obtained after a positive and negative movements respectively from C. Using our discrete representation, we define D(C) as: D(C) = O(C+) − O(C)

(3)

Our definition is motivated by computational tractability and the requirement that the contour be stable against overall changes in its scale. Note that our definition uses the pixel grid explicitly. We are assuming that the smoothness regularization described later will give our method a meaningful continuous limit. Since we define a contour to be a part of the background, O(C+) can be conveniently computed using the contour’s local image statistics and equation (1) by: O(C+) =

−log µ

(a)

(b)

Figure 1. (a). Contour C+ and C− are generated from contour C by a positive movement and a negative movement respectively. (b). Region A’s contour is more salient than region B’s contour because region A’s contour indicates more encoding length increment, but they have the same discontinuity energy if evaluated by local measurements. The line thickness indicates the saliency.

Nbg >> NC . Then we have log(Nf g ) ≈ log(Nf g +NC ) ≈ log(Nbg ) ≈ log(Nbg − NC ) and hf g (i), hbg (i) >> hC (i) for all i. Using (1), (3) and (4), we have D(C) ≈ NC log

X Nf g + NC hbg (i) − hC (i) (5) + hC (i)log Nbg − NC i=0:255 hf g (i) + hC (i)

Equation (5) shows that D(C) can be considered as an evaluation of the local contour’s image properties based on both the global and local image statistics. The first term favors big figure region. The statistical relation between a figure and a background is represented by the discriminah tive function, hfbgg , which can be as complex as needed. As illustrated in Fig. 2a, D(C) reaches the maximum when the discrimination between the figure and the background h is big and hC has the similar distribution as hfbgg . Hence, D(C) respects global and local image properties as desired.

Y µ hf g (i) + hC (i) ¶hf g (i)+hC (i) Nf g + NC i=0:255

hbg (i) − hC (i) Nbg − NC

¶hbg (i)−hC (i) (4)

where hC is the smoothed image–property histogram of the contour region and NC is the number of pixels on C. The difference between D(C) and local edge detection methods is illustrated in Fig. 1b. Note that the only difference between O(C) and O(C+) is that the local contour’s image data switch from the background to the figure. To understand D(C) better, we make the following approximations. We assume that the size of a salient figure is about half the image, which means Nf g ≈

Figure 2. (a). The contour of the black square has the maximum discontinuity energy because the figure is distinct from the backh ground and hC has the similar distribution as hfbgg . (b). The O(C) space of the salient contour in (a). A perfect salient contour does not only organize the image well but also indicates a big organization loss if the contour moves away from it (a deep minimum).

2.3. Contour geometry constraint The last contour quality that we consider is the statistical geometry constraint of the contour shape. Represented

by the curvature at each contour point, natural shapes have high kurtotic distributions [24], which can be modeled by x b generalized Laplacian distributions [5], p(x) = Ae−| a | . In the MDL framework, encoding length can be used to describe the goodness of a contour’s shape [11]. Hence, we can tell how good a shape is from the minimum encoding length using the generalized Laplacian distribution, L(C) = min E(a, b, C) a,b

(6)

where E(a, b, C) is the encoding length of C using the generalized Laplacian distribution characterized by a and b. To find the optimal a and b, nonlinear optimization techniques have to be used [5]. To avoid the complex computations, instead of using the above encoding strategy we use the lower bound of the encoding length given by the entropy encoding method to represent the goodness of a contour shape. On the other hand, since natural shapes show multiscale properties, contours should be grouped at different scales [18,20]. We define the contour geometry constraint measure as:   m1 j Y Y µ hj (i, C) ¶h (i,C)  L(C) = −log  (7) N C j=1:m i=−n:n where hj (i, C) is the smoothed contour–curvature histogram of C at jth scale, m is the number of different scales and n is the biggest curvature. L(C) computes the average encoding length for C at different scales. The smaller L(C) is, the better the shape is.

2.4. Contour saliency function We are ready to define a desirable saliency function based on the above measures. The saliency function should correctly model the relations between these different measures. Fowlkes et al. [8] study the functions of edge based and region based images cues in image segmentation and find that which cues perform best depends on the situation. Hence, for different images the weights on different cues may be different. Since it is not clear how to weight our cues, we construct the saliency function using intuitive rules. We consider the organization and discontinuity qualities are equally important for a salient contour and put the same weights on them. Since salient contours are searched for by gradient descent (section 3), our method favors contours with a deep minimum in O(C), see Fig. 2b. Requiring small O(C) and large D(C) makes sense for discontinuous derivatives as in Fig. 2b; with our implicit regularization, it can also be thought of as requiring a large second derivative. The intuitive meaning is that a salient contour organizes the image well and, also, changes in it cause a big organization loss. Our criterion also helps to overcome contours with trivial local minima, even global minima, in O(C) and favors perceptually

significant figures. For example, our good result in Fig. 5 requires the discontinuity criterion. In choosing how to weight our several cues, we must consider the effects of scale. The organization function, O(C), grows roughly proportional with image size, or area, while the other two functions, D(C) and L(C), grow with the length of C. Since we assume that the size of the most salient figure is roughly half the full image, the salient ¡ ¢0.5 contour length is about 2π area . To compensate this 2π scale variance, we tune the organization function by a co¡ 2π ¢0.5 efficient α = area . Our contour saliency function is: Saliency(C) = −αO(C) + D(C) − βL(C)

(8)

where β is the only free variable and controls the prior of the contour geometry constraint.

3. Variational optimization To search the optimal salient contour based on function (8), we use a variational method (the snake technique). Each contour point v evolves according to the derivative: ∂Saliency(C) ∂O(C) ∂D(C) ∂L(C) = −α + −β (9) ∂v ∂v ∂v ∂v We give ∂O(C+) in detail. ∂v same way.

∂O(C) ∂v

can be computed in the

∂O(C+) = Anv (10) ∂v nv is the outward unit norm to C and A is computed by: A

=

255 X

[(1 + log (hf g (i) + hC (i)))(

i=0

∂hf g (i) ∂hC (i) + ) ∂v ∂v

∂hbg (i) ∂hC (i) − )] ∂v ∂v ∂Nf g ∂NC −(1 + log(Nf g + NC ))( + ) v v ∂NC ∂Nbg − ) −(1 + log(Nbg − NC ))( v v

+(1 + log (hbg (i) − hC (i)))(

The partial derivatives in A are computed locally by: ∂hbg (i) G(i − I+ ) + G(i − I− ) ∂hf g (i) =− = ∂v ∂v 2 ∂hC (i) G(i − I+ ) − G(i − I− ) = ∂v 2

∂NC = ∂v

∂Nf g ∂Nbg =− =1 ∂v ∂v ³ ´0.5 ³ ´0.5 Nf g +1 Nf g −1 2π − 2π π π 2

(11)

(12) (13)

(14)

G is the same Gaussian function used to smooth the image– property histogram in equation (1). I+ and I− are the image property values of the pixels after a positive and a negative contour movement from v respectively. ∂L ∂v is computed by:   j ∂L 1  X X ∂h (i, C)  =− (log hj (i, C) + 1) ∂v m j=1:m i=−n:n ∂v +(log(NC ) + 1) ∂hj (i, C) = sign(i − ∂v

µ

dθ ds

∂NC ∂v

)Gκ (i −

dθ ds

¶j )



¡ dθ ¢j ds v

∂v v v (16) Gκ is the same Gaussian function used to smooth contour– ¡ ¢j curvature histogram in equation (7). dθ ds v is the curvature j ∂ ( dθ ds )v of contour point v at jth scale. can be computed ∂v using Fig. 3 by noticing that any contour point v and its adjacent contour points, p1 and p2, can be represented in the coordinate system in Fig. 3 after a rigid body translation from the image coordinate system. The curvature at v is: µ

dθ ds

¶j = arctan( v

To test the performance of our method, we ran our algorithm on publicly available real images and compared with Normalized-cuts (NC), Chan and Vese’s method [3] (CV) and Paragios and Deriche’s method [16,17] (PD). The Gaussian-mixture model in PD was estimated using the program from [7], which also uses MDL principle to estimate the number of components as PD does.

(15) µ

¶j

4. Experiments

y0 y0 ) + arctan( ) L − x0 L + x0

(a)

(b)

(c)

(d)

(17)

j

∂ ( dθ ds )v ∂v

can be computed from equation (17) and then translated back to the original image coordinate system.

Figure 3. Curvature of a contour point v is computed using its adjacent contour points, p1 and p2.

Our method starts with a contour at the image edge and updates the contour until it converges to a local minimum. It is worth pointing out that an extra uniform inwards force can be used to shrink the contour. This force can help to overcome the contours with small values for D(C). In this sense, this additional inwards force acts like a threshold. If an additional force is used, the contour is not moving according to gradient descent any more. Hence we should record the most salient contour during the evolution. If the contour splits, only the most salient part is kept in our current implementation.

Figure 4. Size: 120 × 80, β = 0.5, 5 scales are used. (a). Our method; (b). PD; (c). CV; (d). NC.

In Fig. 4, though the bottom contour of the bear is partly occluded by grasses, our algorithm successfully overcomes the strong fake edges located on the top of the image and converges to the real salient contour. Without using the local image information, the other two region-based methods can not locate low contrast boundaries accurately. The results by normalized cut are produced by the toolbox available at http : //www.seas.upenn.edu/ timothee/sof tware ncut. In Fig. 5, we show how our method works on textured background. In Fig. 6 and Fig. 7, we show our method’s performance on images with cluttered backgrounds. The noisy edge segments do not affect the convergence of the contour. Note that the region based methods can not accurately locate contours with low contrast. The value of our parameter β should reflect the contrast between the figure and the background and the complexities of the contour shape and the background. In principle, if the contrast is big, the shape is simple, and the background is cluttered, e.g. Fig. 5, then β should be big, otherwise β should be small, e.g. Fig. 4. In our implementation, the

(a)

(b)

(c)

(d)

Figure 5. Size: 120 × 80, β = 1, 5 scales are used. (a). Our method; (b). PD; (c). CV; (d). NC.

(a)

(b)

(a)

(b)

(c)

(d)

Figure 7. Size: 120 × 80, β = 1, 5 scales are used. (a). Our method; (b) PD; (c). CV; (d). NC.

image (a)-(e) (f) (g) (h) (i) (j) (k) (l)

size 80 × 120 80 × 120 80 × 120 64 × 128 120 × 80 120 × 80 120 × 80 120 × 80

β 1 0.6 0.5 1 1.5 1 0.25 2

Table 1. Parameters used for the tests in Fig. 8

(c)

(d)

Figure 6. Size: 80 × 120, β = 1, 5 scales are used. (a). Our method; (b). PD; (c). CV; (d). NC.

β value is manually selected, but setting it automatically is also possible. The number of scales used to encode a contour is also manually selected. We find that the selection of this number is not crucial. With more scales, the convergence of the contour is smoother and easier. In all our tests, we use 5 scales. Smoothing the histograms in equations (1) and (7) also helps to smooth the convergence of the contour. Some other results are given in Fig. 8. The image size and the β value for each image are given in Table 1.

5. Discussions and Conclusions We introduce a new perspective for natural salient contours that a salient contour represents a sharp change in the

ability to organize the image into meaningful parts. This new view extends discontinuity measure from low level image representation into high level representation. A new edge energy measure, which is the derivative of a global image function with respect to the whole contour, is devised from this new definition. We show that this measure combines the image statistics both of global regions and local contour regions and can handle complex statistical relations between figures and backgrounds. We also propose a new contour saliency function, which can overcome trivial local minima in organizing images, and its variational solution. Our approach is validated by experiments on real images. Our current method only searches for the most salient figure, but it could also be applied hierarchically to detect multiple salient figures. It is possible to incorporate our approach into other techniques. For instance, as discussed in section 1.1; when normalized cuts combine edge cues into similarity measures, they may oversegment. Our method can help to group the oversegmented parts.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 8. Some other sample salient contour results

References [1] A. Berengoltsand and M. Lindenbaum, On the distribution of Saliency. CVPR, 2004.

[2] J. Canny. A Computational Approach to Edge Detection. IEEE Trans. PAMI, 9(6):679–698, 1986. [3] T.F. Chan and L. A. Vese. Active Contours Without Edges. IEEE Trans. Image Processing, 10(2):266–277, 2001. [4] J.H. Elder. Are Edges Incomplete? IJCV, 34:97–122, 1999. [5] J.H. Elder, A. Krupnil and L.A. Johnston. Contour Grouping with Prior Models. IEEE Trans. PAMI, 25(6):661–674, 2003. [6] J.H. Elder and S.W. Zucker. Computing Contour Closure. European Conf. Computer Vision, 1996. [7] M. Figueiredo and A.K. Jain. Unsupervised Learning of Finite Mixture Models. IEEE Trans. PAMI, 24(3):381–396, 2002. [8] C. Fowlkes, D. Martin and J. Malik. Learning Affinity Functions for Image Segmentation: Combining Patch-based and Gradient-based Approaches. CVPR, 2003. [9] I.H. Jermyn and H. Ishikawa. Globally Optimal Regions and Boundaries as Minimum Ratio Weight Cycles. IEEE Trans. PAMI, 23(10):1075–1088, 2001. [10] M. Kass, A. Witkin and D. Terzopoulos. Snakes: Active Contour Models. IJCV, 1(4):321–331, 1988. [11] Y.G. Leclerc. Constructing Simple Table Descriptions for Image Partitioning. IEEE Trans. PAMI, 9(6):679–698, 1986. [12] S. Mahamud, L. Williams, K. Thornber and K. Xu. Segmentation of Multiple Salient Closed Contours from Real Images. IEEE Trans. PAMI, 25(4):1–12, 2003. [13] J. Malik, S. Belongie, T. Leung and J. Shi. Contour and Texture Analysis for Image Segmentation. IJCV, 43(1), 2001. [14] D.R. Martin, C.C. Fowlkes and J. Malik. Learning to Detect Image Boundaries Using Local Brightness, Color, and Texture Cues. IEEE Trans. PAMI, 26(5), 2004. [15] D. Mumford and J. Shah. Optimal Approximation by Piecewise Smooth Functions, and Associated Variational Problems. Comm. Pure Math., 577–684, 1989. [16] N. Paragios and R Deriche. Coupled geodesic active regions for image segmentation. ECCV, 2000. [17] N. Paragios and R Deriche. Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation. IJCV, 223-247, 2002. [18] X. Ren and J. Malik. A Probabilistic Multi-Scale Model for Contour Completion Based on Image Statistics. ECCV, 2002. [19] A. Sha’ashua and S. Ullman. Structural Saliency: The Detection of Globally Salient Structure Using a Locally Connected Network. ICCV, 1988. [20] E. Sharon, A. Brandt and R. Basri. Completion Energies and Scale. IEEE Trans. PAMI, 22(10):1117–1131, 2000. [21] J. Shi and J. Malik. Normalized Cuts and ImageSegmentation. IEEE Trans. PAMI, 22(8):888–905, 2000. [22] S. Wang, T. Kubota, J. M. Siskind and J. Wang. Salient Closed Boundary Extraction with Ration Contour. IEEE Trans. PAMI, 27(4):546–561, 2005. [23] L.R. Williams and D.W. Jacobs. Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience. Neural Computation, 9(4):837–858, 1997. [24] S.C. Zhu. Embedding Gestalt Laws in Markov Random Fields. IEEE Trans. PAMI, 21(11):1170–1187, 1999. [25] S.C. Zhu and A. Yuille. Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation. IEEE Trans. PAMI, 18(9):884–900, 1996.