A Pyramid Nearest Neighbor Search Kernel for ... - Semantic Scholar

Report 0 Downloads 49 Views
A Pyramid Nearest Neighbor Search Kernel for Object Categorization Hong Cheng Rongchao Yu University of Electronic Science and Technology of China, China {hchenguestc,yurongchao.prmi}@gmail.com Zicheng Liu Microsoft Research, Redmond, USA [email protected]

Abstract Nearest-Neighbor based Image Classification (NNIC) has drawn considerable attention in the past several years because it does not require classifier training. Similar to an orderless Bag-of-Feature image representation, the traditional NNIC ignores global geometric correspondence. In this paper, we present a technique to exploit the global geometric correspondence in a nearest neighbor classifier framework. We divide an image into increasingly fine sub-regions like the Spatial Pyramid Matching (SPM) approach, and introduce a Pyramid Nearest Neighbor Search kernel by measuring the search similarity between a local descriptor and a feature set in each pyramid window. Instead of using a fixed weighting as in SPM, the weights of the pyramid windows are learned in a class-dependent manner. By doing so, we learn a class-specific geometric correspondence. Finally, an optimal nearest neighbor classifier framework is developed to incorporate the kernel functions over different pyramid windows. We evaluate our proposed approach on a number of public datasets, and show the results significantly outperform existing techniques.

1. Introduction In the past several years, there has been a lot of interests in the Nearest Neighbor based Image Classifiers (NNIC) [2, 8, 6, 1]. Boiman et al. [2] brought people’s attention back to reconsider the effectiveness of nonparametric NNIC, and proposed a method called Naive Bayes Nearest Neighbor (NBNN). Under the NaiveBayes assumption,the direct ‘Image-to-Class’ (I2C) nearest neighbor classifier is the optimal image classifier with better generalization capability. Since then, the

Yiguang Liu Sichuan University, China [email protected]

I2C nearest neighbor approaches [8, 6, 1] have become increasingly popular in computer vision community. One shortcoming of the traditional NNIC approach is that it ignores the global geometric correspondence. On the other hand, the global geometric correspondence was successfully exploited in the learning-based classifier by Lazebnik et al. [4]. They proposed Spatial Pyramid Matching (SPM) that divides an image into increasingly fine sub-regions and computes histograms of local features inside each sub-region. This approach along with its extensions [3, 7] has been shown to significantly improve the classification performance. To overcome the shortcoming, we propose a Pyramid Nearest Neighbor Search (PNNS) kernel that leverages the global geometric correspondence in a nearest neighbor classifier framework. The details are described in the following sections. Section 2 introduces the framework of the proposed approach. Section 3 presents the pyramid search kernel. We discuss the Optimal Naive Bayes Pyramid Nearest Neighbor classifier in Section 4. The experimental results are shown in Section 5. Section 6 concludes this paper.

2. An Overview of the Proposed Approach In this paper, we propose an Optimal Naive Bayes Pyramid Nearest Neighbor classifier. This classifier is built on top of the so-called Pyramid Nearest Neighbor Search kernel. An image is divided into multiple pyramid levels as in SPM [4]. The nearest neighbor search is performed for each pyramid window at different pyramid levels. Instead of using fixed weighting scheme, we allow the weights to vary among the pyramid windows, and these weights are learned in a class-dependent manner. The kernel functions over the pyramid windows are then combined in the Optimal Naive Bayes Pyramid Nearest Neighbor classifier.

Figure 1 illustrates the framework of the proposed approach, which consists of a training step and a testing step. Given a number of training samples F1 with class labels, we first extract local descriptors (e.g. SIFT). For each descriptor, we store its global geometric information including its pyramid level index and window index. We then learn the weights wsc , which are windowdependent and class-dependent, using validating samples F2 . The learning is done by fitting the object deformation from different windows of different classes. In addition, we learn the affine correction parameters αc and β c using F2 . In summary, the output of the training step includes the training feature pool, αc , β c , and ωsc . In the testing step, we extract features along with their pyramid level and window indexes. By using the object model parameters obtained from the training step, the ONBPNN classifier makes the classification decision for each test sample.

Inspired by the sum match kernel proposed by Lyu in [5], we propose the Pyramid Nearest Neighbor Search kernel in a multi-class setting ∑ K(X, Y ) = ϕ(X)T ϕ(Y ) = ωsc Ksc (X, Y ), (1) s∈S

where Ksc (X, Y ) denotes the kernel function between Xs and Ysc ∈ Ys , where Ysc is the set of vectors from the cth class within the sth window. Here, ωsc denotes the importance of the sth window for the cth class in object categorization. The weight map reflects the geometric variability. If pyramid level 0 has the largest weight over all the windows, this type of objects has a high geometric variability. Otherwise, this type of objects has a good geometric correspondence which can be leveraged in recognition. Instead of using fixed weights as in [4], the window-dependent and class-dependent weights ωsc are learned from data to adapt to different global geometric correspondences for different object classes. Furthermore, the kernel function Ksc (·) in Eqn. (1) can be represented in the following form Ksc (X, Y ) =

|X| ∑

Ksc (x, Y )

k=1 |X| ∑ ( ) = minys ∈Ysc ∥xk − ys ∥2 δ(xk ∈ Xs ), k=1

(2)

Figure 1: The framework of the proposed ONBPNN approach.

3. The Pyramid Nearest Neighbor Search (PNNS) Kernel Let X and Y be two sets of feature vectors. Let us define a sequence of windows at pyramid levels, 0, 1, · · · , L − 1, such that there are 2l intervals along each axis at pyramid level l, for a total of (4L − 1)/3 windows in the window set S. Let Xs ⊆ X and Ys ⊆ Y denote the subsets of vectors belonging to the sth window. Intuitively, pyramid search works by placing a sequence of increasingly coarser grids over the feature space and taking a weighted sum of the search scores each corresponding to an individual pyramid window. A point x ∈ X seeks its nearest neighbor y ∈ Y within each pyramid window in S. The search scores are weighted accordingly at different pyramid windows.

δ(·) is an indicator function, and δ(x ∈ Xs ) = 1 if x is within the sth window, otherwise δ(x ̸∈ Xs ) = 0. We would like to point out that K(X, Y ) is not a Mercer kernel [5].

4. The Optimal Naive Bayes Pyramid Nearest Neighbor Classifier The I2C nearest neighbor classifiers, such as NBNN [2] and NBMIM [8], may suffer from a serious performance drop when faced with an unbalanced training set which has a very different number of feature points for different classes. This problem was also observed in [1]. The classification results can be overwhelmingly misled by those categories with a larger training set, resulting in an abnormally low accuracy for the smaller ones. In many real world applications, the training data we can obtain are likely to be unbalanced for the following reasons: 1) some categories may have richer sources so that it is easy to collect a larger training set; 2) Even for the images in the same category, there always exists an unpredicted variation in the number of local feature points.

Partly inspired by [1], we propose the Optimal Naive Bayes Pyramid Nearest Neighbor classifier using the Pyramid Nearest Neighbor Search kernel. An overview of the basic Naive Bayes Nearest Neighbor approach and its implementation can be found in [2]. In an image I with hidden class label c, we have features X = {xk }k ∈ RD . The log-likelihood of a visual feature x relative to an image label c is ( )} { K(x, Y ) 1 − log P (x|c) = − log exp − Zc 2(σ c )2 ∑ c c (3) ωs Ks (x, Y ) s∈S = + log Z c , 2(σ c )2 where Z c = |Y c |(2π)D/2 (σ c )D is a normalization factor obtained by normalizing the probability density function. Furthermore, we obtain the log-likelihood of an image I with the Naive Bayes formulation ∏ − log P (I|c) = − log P (xk |c) x ∈X

=

|X| ∑

k ∑

 s∈S

ωsc Ksc (xk , Y )

k=1

=αc



2(σ c )2

 + log Z c 

Scene-15: In this dataset each category has 200 to 400 images, and the average image size is around 300 × 250 pixels. Similar to [4], we choose 100 random images per category as a training set F1 . The remaining images are divided into two equal parts, using one part as F2 for estimating the weights of each pyramid window and the other part for testing, and vice versa. Caltech-101(5 Classes): For the sake of closely following the experimental setup in [1], we also choose the five most popular classes from the Caltech-101 dataset, including airplanes, car-side, faces, motorbikes and background. These images present relatively little clutter and variation in object scales and poses. We use the same experimental setup as in [1]. Images are resized to a maximum of 300 × 300 pixels before processing. The training, testing and validating sets each contain 30 randomly selected images per class. We compare the proposed ONBPNN approach with the NBNN [2], the ONBNN [1], SPM [4], Image-toClass Distance Metric Learning (I2CDML) [6]. To better evaluate the role of global geometric information in nearest neighbor classifiers, we also provide the experimental results of the Naive Bayes Pyramid Nearest Neighbor (NBPNN), which is equivalent to the case without parameter optimization, αc = 1, β c = 0.

5.2. The Scene-15 Dataset

ωsc Ksc (X, Y ) + |X|β c ,

s∈S

(4) where αc = 2(σ1c )2 and β c = log Z c are the parameters of affine transformation, which play the role of correcting affine distance [1]. Thus, under the Naive Bayes assumption, the optimal class prediction cˆI of image I is } { ∑ c c c c ωs Ks (X, Y ) + |X|β . (5) cˆI = argminc α s∈S

In Eqn. (5), we need to estimate the parameters, αc , β , and ωsc . First, both αc and β c can be learned by a linear programming kit in the same way as in [1]. Second, we learn weight ωsc using F2 as the validating samples by a multi-class AdaBoost technique [9]. c

In this subsection, we evaluate the proposed ONBPNN approach on the Scene-15 dataset. The first experiment is to compare the proposed approach to two I2C nearest neighbor approaches, NBNN and NBPNN, shown in Table 1. The parameters in the ONBPNN approach are set to: L = 3 and S = 21. From Table 1, we can see that: (1) Compared to the traditional NBNN approaches, ONBPNN approach remarkably improves the accuracy; (2) The ONBPNN approach outperforms NBPNN by a large margin; (3) Both ONBPNN and NBPNN are better than NBNN. Table 1: Performance evaluation on the Scene-15 dataset using NBNN, NBPNN, and ONBPNN. Method

NBNN

NBPNN

ONBPNN

Accuracy 71.59±0.74% 77.89±1.54% 85.09±0.30%

5. Experimental Results and Analysis 5.1. Experimental Setup We evaluate our proposed method on two commonly-used public datasets: Scene-15 and Caltech-101. Let’s describe them briefly as follows.

The second experiment evaluates the effectiveness of the proposed Pyramid Nearest Neighbor Search kernel against the other two spatial pyramid approaches, Spatial Pyramid Matching (SPM) kernel proposed in [4] and I2CDML [6], shown in Table 2. In our pyramid nearest neighbor search kernel, we use the same pyra-

Table 2: Performance evaluation on the Scene-15 dataset using SPM, I2CDML, ONBPNN. Method

SPM[4]

I2CDML[6]

ONBPNN

Accuracy 81.40±0.50% 83.7±0.49% 85.09±0.30%

Table 3: Performance evaluation on the Caltech-101 dataset. Method

NBNN[1]

ONBNN [1]

ONBPNN

Accuracy 73.07±4.02% 89.77±2.31% 93.33±5.27%

6. Conclusions In this paper, we have proposed an Optimal Naive Bayes Pyramid Nearest Neighbor classifier. We have introduced a Pyramid Nearest Neighbor Search kernel to enhance the object matching performance by leveraging global geometric correspondence. The approach has been evaluated on two popular public object categorization datasets. The experiments show that the proposed approach is robust to high geometric variability, and outperforms existing approaches. Figure 2: The weight maps of all 15 classes on the Scene-15 dataset.

mid generating strategy as those in [4] and [6]. But the proposed approach gives better performance. Finally, we show the weight maps of the windows at each pyramid level in Figure 2. There are three pyramid levels represented by the left, middle and right portions respectively. The weight value represented by the vertical axis. From this figure, we can see that: (1) Each window of each class plays a different role for object categorization thus resulting in different weight values; (2) For all the classes, the level 0 window has relatively large weight values. That is, it plays an important role for object categorization for all the classes. It is interesting to note that this observation is very different from SPM’s weighting scheme. SPM empirically sets the weight of each level as 1/2L (l = 0) and 1/2L−l+1 , l = 1, 2, · · · , L − 1. That is, the level 0 window has the smallest weight over all pyramid levels.

5.3. The Caltech-101 (5 Classes) Dataset Table 3 shows the classification accuracies on Caltech-101 using three different approaches: NBNN, ONBNN and ONBPNN. From this table, we can see that the proposed approach outperforms the ONBNN. ONBPNN obtains the largest weights at the higher pyramid levels over most classes, which indicates that finerlevel spatial search is beneficial to object recognition in this dataset.

Acknowledgement This research was partially supported by the grant from NSFC(No. 61075045), the Program for New Century Excellent Talents in University (NECT-10- 0292), the National Basic Research Program of China (No. 2011CB707000), and the Fundamental Research Funds for the Central Universities. We also thank the anonymous reviewers for their valuable suggestions.

References [1] R. Behmo, P. Marcombes, A. Dalalyan, and V. Prinet. Towards optimal Naive Bayes nearest neighbors. In ECCV, 2010. [2] O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In CVPR, 2008. [3] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, 2007. [4] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. [5] S. Lyu. Mercer kernels for object recognition with local features. In CVPR, 2005. [6] Z. Wang, Y. Hu, and L. Chia. Image-to-class distance metric learning for image classification. In ECCV, 2010. [7] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009. [8] J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In CVPR, 2009. [9] J. Zhu, S. Rosset, H. Zou, and T. Hastie. Multi-class Adaboost. Techical Report, University of Michigan, Ann Arbor, 2006.