Image Segmentation as a Classification task - UCI

Report 4 Downloads 20 Views
Image Segmentation as a Classification task

Pornpat Nikamanon Department of Computer Science University of California, Irvine [email protected]

Dennis Park f Department of Computer Science University of California, Irvine [email protected]

Abstract We implement two-class classification model for image segmentation. This project draws loosely on [1]. In a preprocessing stage an image is oversegmented into superpixels by normalized cut algorithm. For a segment as a mergence of superpixels, we define two features based on similarity in brightness. We train a logistic regression classifier to combine these features. As a ground truth segmentation, we use the database of segmentation produced by human. Finally we conduct experiments and qualitatively evaluate the results.

1

Introduction

X. Ren and J. Malik introduced a classification model for image segmentation in [1]. The main theme of this paper is establishing a reliable way to determine what ”good” segmentation is. The criterion of ”good” segmentation is not easily defined because it is subjective by nature. Given a single image, the segmentation that produced by human vary in a wide range according to where the drawer put the emphasis on or how much detail she or he prefers. Therefore, it is not natural to always minimize or maximize a certain objective function to obtain the ”best” segmentation. Using two-class classification model to determine just goodness or badness is reasonable, because there is a general consensus about what segmentation is good or bad. The evalutation of segmentation is roughly based on proximity, similarity and good continuation. The principle of proximity states that the intuitive idea that if two regions are close to each other it is more likely to be a part of one grouping. The last principle, good continuation, states that natural objects have a tendency to assume a smooth boundary line rather than radical points. The principle of similarity is the one that is most important and fundamental in determining grouping. This principle is twofold. 1. intra-region similarity: the elements in a region are similiar in brightness, texture and have low contour energy inside the region. 2. inter-region (dis)similarity: the elemens in different regions are dissimilar in brightness, texture and have hight contour energy inside the region. The authors of [1] found out a way to obtain and evaluate good segmentation by using database of human segmented images [2]. They first constructed a logistic classifier with the features stated above, and train it with human segmented image. The database of [2]

enables supervised learning in image segmentation field. For positive examples, they use the human segmented images in the database. For negative examples, they use the human segmentation randomly matched with a different image. In this project, we implement a simplified version of classification model appeared in [1]. The difference from the classifier of [1] is that we use only a subset of features that are used in [1]. The method to calculate features are modified to be simple and intuitive. Also, after we train the classifier, we evaluate it only qualitatively instead of taking advantage of quatitative method used in [1].

2

Methodology

2.1

Training phase

Figure 1 is the roadmap of training phase. Each step will be described in detail in this section.

Figure 1: road map of training phase

2.1.1

Normalized Cut

As a preprocessing step, we oversegment an image into superfixels using normalized cut introduce in [3]. The segmentation problem in this step is treated as a graph partitioning problem. The algorithm exploits the hierarchical nature of partitioning, and is recursively applied to the segmented portion until the segment meet a certain threshold condition. In our implementation, we set the threshold to be the number of superpixels in an image. We can assume that each superpixel is homogeneous in size and brightness. The homogeneity of brightness is achieved by using intensity of pixel as cutting criterion.

−kF(i) −F(j) k2 σ2 I

wij =e ( e



−kX(i) −X(j) k2 σ2 X

0 2.1.2

if k X(i) − X(j) k< r otherwise

Features of Segment

Given a superpixel, we find a way to extract two features related to brightness from it. The two features are the measurements of intra-region similariy and inter-region similarity of brightness. In [1], those features are calculated using χ2 distance of histograms of brightness values. In this project, we devise a simple and intuitive method to compute the features. 1 ∀qi ∈ S V ar[I(qi )]  1 X 1 X 2 Bext (S) = I(q) − µ(i) m n Bint (S) =

q∈SB

i∈Ci

where SB = {p|p ∈ S, p is on the border}, Ci = {p|p ∈ / S, p is contiguous to qi }, µ(i) is mean intensity of segment that includes superpixel i , m = |SB | and n = |Ci |.

Given a segment S, Bint (S) represents intra-region similarity by calculating variance of intensity among all superpixels in S and taking the inverse of it. In this way, we make Bint large for the segment with large homogenuity in brightness. For the segment with only one superpixel we set Bint (S) to be reasonably large fixed number, because the normalized cut step guarantee that each superfixel is homogeneous in brightness. The computation of Bext (S) involves only the superpixels on the border. For each superpixel on the border, we calculate mean of the square of error with mean intensity of segment which it is contiguous to.In the figure above, superpixel q is facing 1’, 2’ which belong to segment S 0 and 3’ n  2  which belongs to S 00 . The contribution of q to Bext (S) is 13 2 I(q) − µ(S 0 ) + I(q) − 2 o µ(S 00 ) . Bext (S) is calculated as the mean of such value for all superpixels on the border. 2.1.3

Generating training data

The dataset for traning consists of equal number of good examples and bad examples. Given a normalized cut image, we first generate a segmentation whose edge is subset of superpixel boundaries and almost same as the human marked segmentation. We already know from the property of normalized cut that human marked boundaries are almost ”covered by” superpixel boundaries with the tolerence of only few pixels.

Figure 2: generating good examples(left) and bad examples(right) For each superpixel, we compute the overlapping ratio with each human marked segment and classify the superpixel to be a part of the segment that is most overlapped with it. We use all the segments in this segmentation as ’good segment’ and label them as one. Bad segment is generated by matching the superpixels to the human marked segmentation of different image. The image and the human’s segmentation is randomly picked and we label the bad segments as zero. Figure 2 is an example from generating good segments and bad segments. 2.1.4

Training the classifier

Since we formulated training dataset including features and binary labels, we can train a binary classifier. We use a simple logistic regression classifier that is already covered intensively in the previous lectures. We linearly combine the features and use it as a activation function for non-linear sigmoid function. X G(S) = cj Fj (S) − θ j

The weights cj and intercept term θ are learned by well-known iterative reweighted least square algorithm. 2.2

Testing phase

Testing the classifier is not straightforward, since we have to find good segmentation, not good segment. Therefore we need a criterion for determine the best segmentation based on our classifier. If that segmentation appears in most of the segmentation drawn by human, we can determine that our classifier works well. In [1], the evaluating procedure is done formally by using the benchmark program of [2]. However we just check this only qualitiatively. The following is the roadmap of testing phase:

Figure 3: road map of testing phase We know how to determine good segments. Therefore, to determine the best segmentation, we can think of it as the one that includes most good segments. This leads us to the following criterion : The best segmentation is the one maximizing the sum of activation function

used in training phase. f (S) =

X X S∈S

cj Fj (S) − θ



j

To find the best segmentation, we have to traverse the whole segmentation space. We borrow the idea for doing this from [1]. The random search involves three basic moves : (1) shift: a superpixel is shifted from its segment to an adjacent segment; (2) merge two adjacent segments are merged into one; and (3) split: a segment is split into two. The splitting operation is done by clustering the superpixels composing the segment into two parts. This clustering is also used to initialize the search. At each step, the algorithm picks one of the moves above and construct a new segmentation S 0 . If f(S) is improved  by this f (S 0 )−f (S) step, we accept it. Otherwise, we accept it with probability of exp , where T T is decreasing linearly over time. While testing the classifier, we found out that the segmentation had a tendency to have excessive number of segments. This problem is also mentioned in [1], and it is because the objective function f is not normalized w.r.t. the number of segments. To solve this problem, prior distribution on segment size |S| is introduced to fit the empirical distribution. We also adopt this solution, modifying the objective function to: X  (log|S| − µs )2  f˜(S) = f (S) + − 2σs2 S∈S

This prior distribution decrease the probability to generate split move when the segment size is large.

3

Experimental Results

Before the step of normalized cut, we resized the image to be 240-by-160, because normalized cut algorithm involves computation of eigenvectors of matrix of which size is the order of the size of image. (400-by-300 is already impractical.) Therefore, each superpixel contains about 200 pixels. We set the number of superpixel in an image to be 200. Each cut took about 4 minutes on Pentium 1.73GHz. An example of normalized cut is at Figure. From the human marked segmentation database [2], we generated training dataset which consists of 486 of good segments and 385 bad segments. These segments includes the ones that has only single superpixel. In calculating Bint (S) of such superpixel, we take the inverse of the variance (which is 0) which make the computatation infeasible. Therefore, we set Bint (S) of these superpixels to be fixed number of 100. These superpixels was excluded when training the classifier, since they do not represent correct points in the feature space. Filtering out those superpixels, we had 399 good segments and 331 bad segments for training. Figure 4 shows the feature space superimposed with decition boundary trained with IRLS algorithm. This figure also shows infeasible points that are at 100 extreme of Bint axis. The trained weight was c1 = 0.0363, c2 = 37.7582 and θ = −2.9616, where c1 and c2 are coefficients of Bint (S) and Bext (S), respectively. Training one data took about 4 minutes. We set the parameter µs = log20 (20 superpixels) and σs = 2.0 in random search. It took 6-10 minutes for finding the segmentation for one image. The results are at Figure ....

4

Conclusion

In this project, we trained a logistic regression classifer to implement image segmentation. As a preprocessing step, we oversegmented the image into fixed number of superpixels using normalized cut. We defined two features of segment that represent the similarity within

Figure 4: feature space and decision boundary

Figure 5: Segmentation Results

a segment and dissimilarity between segments respectively. The dataset for training the classifier is generated by using human marked segmentation. Positive data was generated by matching the corresponding segment produced by merging superpixels to the correct image. Negative data was generated by matching them to randomly picked image. In the testing phase, we defined an objective function to find the best segmentation. This objective function has additional term to keep our algorithm from failing. Our classifier is not perfect for a few reasons. The most obvious one is that we only considered brightness as determining features. Real world images have other important properties like texture and curvilinear continuity, etc. Once one find a way to formulate these features, these can be immediately handled in our framework as the same way the brightness is. That is, we can define two features for each property which are corresponding to intra-region similarity and inter-region dissimilarity, respectively. We can also include global features such as symmetry or object model to improve the performance. There is also significant room for improvement by devising statistically sophisticated ways to extract features from the raw image data. Although we tried to define the features to make them as informative as possible, it is still based on ad-hoc method. A better feature can be extracted by including distribution information of that feature.

Reference [1] X. Ren & J. Malik. Learning a Classification Model for Segmentation. In ICCV ’03 volume 1, pages 10-17, 2003. [2] D. Martin, C. Fowlkes, J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuing ecological statistics. In ICCV ’01, volume 2, pages 416-423, 2001. [3] J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR ’97, pages 731-7, 1997.