Entropy Descriptor for Image Classification Hongyu Li, Junyu Niu, and Jiachen Chen
Huibo Liu
School of Computer Science Fudan University Shanghai,China
School of Software Engineering Tongji University Shanghai,China
{hongyuli,jyniu,0361040}@fudan.edu.cn
[email protected] ABSTRACT
model of training images, which could be used to determine the class label of test images. The classification strategy here is to find the best position of a test image in all cycles. The test image is grouped into the class providing the best position, i.e., the class with the least entropy increase resulting from inserting the image to this position. In this study, the optimization problem is solved using tabu search. To expediate the procedure of finding the optimal cycle, parallel computing on GPU is adopted to simultaneously calculate the entropy of each element.
This paper presents a novel entropy descriptor in the sense of geometric manifolds. With this descriptor, entropy cycles can be easily designed for image classification. Minimizing this entropy leads to an optimal entropy cycle where images are connected in the semantic order. During classification, the training step is to find an optimal entropy cycle in each class. In the test step, an unknown image is grouped into a class if the entropy increase as the result of inserting the image into the cycle of this class is relatively least. The proposed approach can generalize well on difficult image classification problems where images with same objects are taken in multiple views. Experimental results show that this entropy descriptor performs well in image classification and has potential in the image-based modeling retrieval.
2.
ENTROPY DESCRIPTOR
The semantic representation of images is the key to the success of image classification. In this study, we propose GEOmetric Manifold ENtropy (GEOMEN) to describe the semantic similarity of images in the feature space. Specifically, given a set of image feature vectors X = {xi |xi ∈ Rm , i = 1, 2, ..., n}. We first define an entropy cycle of length n as a closed path without self-intersections. Each vector in this cycle is connected with two neighbors and the corresponding connection order O is symbolized as {o1 , o2 , . . . , on , o1 }, where the entry corresponds to the index of vectors. Then, the GEOMEN of the set X with the order O is represented as the average of the entropy on every point in the cycle as,
Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; I.4.8 [Image Processing and Computer Vision]: Scene-Analysis—Object recognition
General Terms Algorithms, Performance, Experimentation
S(X, O) =
Keywords image classification, entropy minimization, cycle
n 1∑ s(X, O, i). n i=1
(1)
and each s(X, O, i) is the sum of two components: the spatial position p(X, O, i) and geometric g(X, O, i) as,
1. INTRODUCTION
s(X, O, i) = αp(X, O, i) + g(X, O, i).
Image classification is the task of classifying images according to their object category, which is significant for both effective image organization and retrieval. This problem has been the subject of many recent papers [2, 3] using classifiers based on support vector machine (SVM). Achieving high image classification accuracy, however, is quite challenging. This is partly because semantically related images may reside in an embedded manifold and not a linear hyperplane in the feature space. In this paper, we focus our attention on solving this problem and propose a novel entropy descriptor for image classification. The proposed entropy is used to describe an embedded manifold with its geometrical features. Image classification is implemented through organizing images as a semantically continuous cycle with entropy minimization. The optimal cycles are actually the extracted
(2)
Here, α is used to adjust the contribution of p to GEOMEN. GEOMEN essentially represents the smoothness and sharpness of the cycle with the connection order O. In addition, it is also a metric of disorder and similarity of the data in the embedded manifolds. Since image ranking can be thought as the problem of extracting a 1-dimensional manifold, actually a curve, we only consider the representation of GEOMEN on 1-dimensional curves. The spatial component of GEOMEN is measured by the Euclidean distance, p(X, O, i) = ∥xoi − xo(i+1) ∥2 , where xoi and xo(i+1) are continuously connected in the cycle order O. The geometric component of GEMOEN is composed of two terms: the curvature κ of curves and a regularization term ρ. That is, g(X, O, i) = κ2 (xo(i−1) , xoi , xo(i+1) ) + ρ2 (xo(i−1) , xoi , xo(i+1) , xo(i+2) ).
Copyright is held by the author/owner(s). SIGIR’10, July 19–23, 2010, Geneva, Switzerland. ACM 978-1-60558-896-4/10/07.
(3)
The reason of introducing the regularization term is that the dis-
753
datasets are tested: UMIST face1 , car, gesture, and aircraft. For the convenience of 10-fold cross-validation, we discarded those classes with the sample number less than 20 in the UMIST face database. The car images were taken with natural background clutters. Each gesture class includes 20 images taken from multiple views. The aircraft images were produced through projecting threedimensional aircraft models into different planes, which is like changing the viewpoint by rotating the models in some sense. To compare with state-of-the-art methods, the classifier based on SVM was also tested on the datasets stated above. All test results are presented in Table 1, where the second and third columns respectively represents the number of classes and images in each dataset. The last two columns are the average classification accuracy respectively obtained using our approach and SVM. The value in parentheses denotes the confidence interval. Although it is well known that SVM has the strong ability of generalization in image classification, the proposed framework obviously has the better performance than SVM, higher accuracy and smaller confidence interval. In addition, we employed the NVIDIA GTX 260 GPU to speed up computation. There is distinct improvement of almost 100 times in the computation speed when the size of a class in the aircraft dataset is only 100. Moreover, the speedup ability is proportional to the size of a dataset, which means that our approach has advantages in handling large-scale datasets.
Table 1: Image classification on different datasets. datasets class image ours(%) SVM(%) UMIST 13 360 97.52(1.93) 85.84(3.59) car 15 450 93.54(1.64) 71.04(5.33) gesture 11 220 91.36(4.19) 78.64(7.20) aircraft 20 2000 81.33(2.21) 65.34(3.22)
crete curvature is sensitive to noise. The regularization term can significantly improve the robustness of our algorithm.
3. IMAGE CLASSIFICATION Since there is a clear distinction between two classes, the entropy will greatly increase at boundary points. This enlightens us that misclassifying a point into a class must lead the sharp increase of the entropy of this class with an optimal cycle. In reverse, if a point is correctly grouped into a class, the entropy of this class with an optimal cycle will only change a little, where the principal idea of the proposed classification framework comes from. In our image classification framework, each class is first respectively trained to obtain an optimal cycle (model). Then an unknown image is assigned a class label through comparing the entropy increase after inserting this image into each optimal cycle. In order to find the optimal cycle, we need to minimize the GEOMEN, O∗ = arg min S(X, O). In this study, we approximate the global minimum of the entropy through a simplified tabu search method. The details about the tabu search method can be found in reference [4]. Since the calculation of entropy of each element is independent, parallel computing is a good way to speed up such an optimization problem. This study makes full use of the ability of graphic processing unit (GPU) in parallel computing and implements the tabu search procedure with the CUDA programming technology. The optimal cycle of each class is actually the extracted model of training images, which could be used to determine the class label of unknown (test) images. The strategy of classification here is to find the best position of a test image Q in all cycles. We means by "best" that the increase of entropy ∆S due to adding this test image to the current position is the least among all positions. The test image is grouped into the class providing the best position. In this case, however, the decision strategy appears too simple to provide correct class information due to the disturbance of acquisition noise and computation error. Therefore, we bring in the idea of k-NN and make it suitable for our framework. Specifically, we pick first k best positions as possible candidates, and then rank all classes according to the following criterion, r(Ci , Q) = (count(Ci ) − β)/avg(Ci ), where parameter β is introduced to prevent over-fitting. count(Ci ) denotes the number of candidates belonging to class Ci . avg(Ci ) is the average entropy increase that candidates ∑ in class Ci result in and can be formulated as, avg(Ci ) = ( Ci ∆S)/count(Ci ), where ∆S represents the entropy increase if image Q is added. The rank r(Ci , Q) implies the confidence of image Q belonging to class Ci . Obviously, high confidence requires more count and less avg when β is fixed. Therefore, image Q is considered as a member of the class with highest rank.
5.
CONCLUSIONS
The good classification performance on the UMIST face and gesture datasets demonstrates that the proposed framework is quite promising in the application of face and gesture analysis. In spite of the existence of background clutter in the car images, our method can still work well with the average accuracy 93.54% and confidence interval 1.64, much better than SVM with the average accuracy 71.04% and confidence interval 5.33. The success in classifying the projection images of aircraft models makes possible the future image-based model retrieval. Anyway, no matter how different the viewpoint of images is or how disordered the image background is, the proposed classification method can always perform well, which demonstrates its feasibility and robustness.
6.
ACKNOWLEDGMENTS
This research was partially supported by National High Technology Research and Development Program 2009AA01Z429, Shanghai Leading Academic Discipline Project B114, and Natural Science Foundation of China Grant 60903120.
7.
REFERENCES
[1] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR ’07, pages 401–408. [2] K.-S. Goh, E. Chang, and K.-T. Cheng. SVM binary classifier ensembles for image classification. In CIKM ’01, pages 395–402, 2001. [3] C. Wang, D. M. Blei, and F.-F. Li. Simultaneous image classification and annotation. In CVPR 2009, pages 1903–1910. [4] C. Zhang, H. Li, Q. Guo, J. Jia, and I.-F. Shen. Fast active tabu search and its application to image retrieval. In IJCAI’09, pages 1333–1338, 2009.
4. EXPERIMENTS The image feature adopted in the following experiments is the Pyramid of Histograms Orientation Gradients (PHOG) [1]. Four
1
754
http://images.ee.umist.ac.uk/danny/database.html