Binary Invariant Cross Color Descriptor Using Galaxy Sampling

Report 1 Downloads 35 Views
21st International Conference on Pattern Recognition (ICPR 2012) November 11-15, 2012. Tsukuba, Japan

Binary Invariant Cross Color Descriptor Using Galaxy Sampling Guo-Hao Huang1, Chun-Rong Huang1,2 1. Institute of Networking and Multimedia, National Chung Hsing University, Taiwan 2. Dept. of Computer Science and Engineering, National Chung Hsing University, Taiwan {n99083003, crhuang}@cs.nchu.edu.tw increase the matching efficiency under the same feature dimension of descriptors. Besides the matching efficiency, it is also important to develop new efficient descriptors with much less construction time. Speeded up robust features (SURF) [9] is proposed to reduce the computational complexity of SIFT [1]. Instead of computing edge orientations, relative relationship modeled by subtractions between pixels is proposed in [10]. Several descriptors, such as contrast context histogram (CCH) [2] and local binary pattern (LBP) [3], are also developed based on the relative relationship concept. This concept is further extended to binary strings by binary robust independent elementary features (BRIEF) [11]. In this paper, we propose a new efficient descriptor using the relative relationship between pixels. Two issues should be addressed at first. The first one is that how many pixels in the region surrounding the keypoint should be sampled for descriptor computation. Sampling more pixels will increase the dimension of the descriptor and require more descriptor computation time. In contrast, sampling too few pixels will decrease the discriminability of the descriptor as indicated in [2][11]. The second issue is that how to model the relative relationship. As shown in [2][3][11], the intensity is applied to represent the relative relationship between pixels. Sande et al. [12] have shown that applying the color invariant space to traditional intensity based descriptors can significantly increase the discriminability. How to efficiently apply the color invariant space to represent the relative relationship between pixels remains a problem. To deal with these two issues, we propose galaxy sampling to select pixels from the local region surrounding the keypoint. Based on the sampling results, pairs of points are constructed for relative relationship comparison between color channels of the color invariant space. Then, binary strings are obtained to form our descriptor. As shown in the experiments, the discriminability of our descriptor significantly increases in most cases using proposed galaxy sampling and the cross color channel concept.

Abstract In this paper, we propose a new descriptor which is computed by comparing invariant cross color channels of pairs of points in the local patch. To efficiently obtain the sampled pairs of points, a galaxy sampling pattern is proposed. As shown in the experiments, our descriptor using invariant cross color channels and the galaxy sampling can achieve the best performance in most cases with slight computation time increasing.

1. Introduction Finding correspondences from images of a scene is an important problem in computer vision and can be applied to many applications [1][2]. One of the most common methods to obtain correspondences is using feature descriptors [1][2][3]. Feature descriptors aim to find photometric and geometric invariant descriptions of a local image patch surrounding a feature point. Moreover, they are designed to be discriminative so that they can be uniquely defined and accurately matched. Scale invariant feature transformation (SIFT) [1] is one of the most representative descriptors. However, the computation and matching of SIFT are time consuming [4]. To reduce computational burdens, two approaches can be considered. The first one is to increase the descriptor matching efficiency. Thus, reducing the feature dimension of the descriptor is an intuitive way to decrease the descriptor matching time. Fewer dimensions of descriptors imply less matching computation. Traditional dimensionality reduction methods, such as principal component analysis (PCA) [5], have been applied as shown in PCA-SIFT [6], and gradient location and orientation histogram (GLOH) [4]. The other consideration is to increase the computational efficiency when computing the similarity between descriptors. For example, KD-tree [7] and locality sensitive hashing [8] are applied to

978-4-9906441-1-6 ©2012 IAPR

2610

Algorithm 1 Generate the galaxy sampling pattern. Require: A keypoint pc which represents the galaxy center G, celestial body distribution Dl, and orbital radius Rl for each level l of the galaxy. 1. Let the origin of the coordinate system of R be located at pc. 2. In level 0 do 3. Generate the celestial body subgroup of G. 4. for a = 0 to D1 do Generate fixed star Fa using Algorithm 2. Generate celestial body subgroup of Fa. 5. for b = 0 to D2 do Generate planet Pb using Algorithm 2. Generate celestial body subgroup of Pb. 6. for c = 0 to D3 do Generate satellite Sc using Algorithm 2. 7. end for 8. end for 9. end for 10. Return galaxy sampling pattern

Figure 1. The galaxy sampling pattern. Moreover, the computation time only slightly increases compared to intensity based methods.

2. Method 2.1. Pixel Sampling Using Galaxy Pattern

Algorithm 2 Generate location of celestial body. Require: l, Rl, Dl, Ol, tl, where l ∈ {1, 2, 3}, {O1 = G, O2 = Fa, O3 = Pb} and {t1 = a, t2 = b, t3 = c}. 2π 1. Compute angle θ = tl . Dl 2. Compute the location p of the celestial body ⎡ x ⎤ ⎡cos(θ ) ⎤ p=⎢ ⎥=⎢ ⎥ Rl + Ol . ⎣ y ⎦ ⎣ sin(θ ) ⎦ 3. Return p

To build a descriptor of a keypoint, the first step is to decide that which pixels locating in the neighbor region R surrounding the keypoint should be used to construct the descriptor. In SIFT [1] and CCH [2], all of the pixels in R are used. In contrast, only particular pixels in R are sampled in LBP [3] and BRIEF [11]. Constructing descriptors from all of the pixels in R may not take advantages of construction speed, storage efficiency and recognition rate [11]. Thus, choosing proper pixels becomes an important issue to build distinctive descriptors with less construction time. Also shown in [11], sampling sufficient pixels from R offers good representation of the image patch. They also reveal that regular sampling patterns cannot well represent the local patch. According to their experiments, Gaussian random sampling is suggested to sample pixels in R when building BRIEF. However, the discriminability of BRIEF is varying with respect to different random sampling patterns. The matching results of BRIEF from different patterns are also hard to be the same. In order to eliminate the instability of the random sampling, we propose a new sampling method called galaxy which is inspired from the shape of barred spiral galaxy to sample pixels in R for building our descriptor. The structure of the galaxy sampling pattern is flexible and easy to be constructed according to the requirement. There are four levels in the galaxy sampling pattern including the center of the galaxy (level 0), the fixed stars of the galaxy (level 1), the planets of the fixed star systems (level 2) and the satellites of the planets (level 3). As shown in Figure 1,

a keypoint pc represents the center G (red circle) of the galaxy. Several fixed stars F (yellow circle) surround G. Each fixed star may contain several planets P (green circle) to form its own solar system. Satellites S (black cross) may exist for each planet. Given a local region R surrounding a keypoint pc, we set the origin of the local coordinate system at pc. Because the celestial bodies of the galaxy contain hierarchical relationship, we firstly define the location of G and then generate the fixed stars and the planets. Finally, we generate the satellites according to the planets. Detailed algorithms can be found in Algorithm 1 and Algorithm 2. After generating the galaxy pattern, celestial bodies, such as fixed stars, planets and satellites, represent the sampled pixel locations p = (x, y) in R. In practice, we can not only control the numbers of the fixed star systems, planets and satellites, but also decide if the fixed stars, planets or satellites exist in the galaxy with few parameter adjustments.

2611

2.2. Binary Invariant Cross Color Descriptor The galaxy sampling pattern defines which pixel in R will be sampled to construct the descriptor. To obtain the relative relationship between pixels, we group pairs of sampled points at the same level of the galaxy sampling pattern. Each celestial body pi,l will be compared to the other celestial body pj,l on its opposite side at level l. Each pair of points is connected by a dot line as shown in Figure 1. As indicated in [12], the integration of color and descriptor significantly enhance the discriminability of states-of-the-art descriptors. In this paper, we apply the transformed color space which has been shown an efficient and effective color invariant space [12] to represent the color information of R. The transformed color space (R’, G’, B’) is defined as follows: ⎛ R − μR ⎜ ⎛ R′ ⎞ ⎜ σ R ⎜ ⎟ ⎜ G − μG ⎜ G′⎟ = ⎜ ⎜ B′ ⎟ ⎜ σ G ⎝ ⎠ ⎜ B − μB ⎜ σ B ⎝

⎞ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎟ ⎠

(c) Graffiti

(d) Wall

(e) Leuven

(f) Ubc

computation burdens, we implement the comparison using integral images built from each invariant color channel. Since our descriptor is constructed using binary strings, we also use Hamming distances as shown in [11] to measure the similarity.

(1.)

3. Experiments In the experiments, we use the six public test image sequences 1 for evaluation. Figure 2 shows some examples of the datasets. Each dataset contains six images and we match the first image with remaining five images in each set to construct the recall and 1precision curves described in [4] for performance evaluation. We compare our method to several stateof-the-art approaches including SIFT [1], SURF [9] and BRIEF [11]. We use the galaxy sampling pattern as shown in Figure 1. There are 16 fixed stars in the galaxy. Four of the fixed stars contain their own solar systems with 16 planets. 16 planets contain 16 satellites. A total of 336 (=16+4×16+16×16) points are sampled in our descriptor which are much less than 512 sampled points suggested in BRIEF (32) [11]. The orbital radii of F, P and S are 12, 8 and 4, respectively. For fair comparison, we used the SURF detector to detect keypoints. Please note that our descriptor does not use any keypoint information such as edge orientations and scale properties for rotation and scale invariance. We implement our descriptors with the gray level (Galaxy_G), invariant color channels (Galaxy_N), and invariant cross color channel (Galaxy_C). The descriptor dimensions of these three versions are 21, 63, and 126, respectively. As shown in Figure 3, our method and BRIEF perform better results compared to SIFT and SURF in most cases except the Graffiti dataset (Figure 3(c)). Because both of our descriptor and BRIEF are not designed to be rotationally invariant, the results of

(2.)

where pi,Cl and p Cj, l are the values of the invariant color channel Cm and Cn of the sampled point pi and pj at the same level l of the galaxy sampling pattern, respectively. Please note that Cm ∈ {R’, G’, B’} and Cn ∈ {R’, G’, B’}. Thus, there are nine cross channel combinations. Because some combinations contain the same absolute values, we only consider six pairs among them including {(R’, R’), (R’, G’), (R’, B’), (G’, G’), (G’, B’), (B’, B’)}. By comparing all of the pairs of sampled points under cross color channels, a binary string is built as the descriptor of pc. To avoid the noise effect, we compute the sum of values of each invariant color channel of the 9×9 region surrounding the sampled pixel. To reduce the m

(b) Trees

Figure 2. Images used for evaluation under (a) and (b) image blurring, (c) and (d) viewpoint changes, (e) lighting changes, and (f) JPEG compression.

where μc and σc are the mean and the standard deviation of the distribution in the cannel C computed over the whole image and C ∈ {R, G , B}. A naive way is to apply each channel of the transformed color space to build the relative relationship between the pair of sampled points. However, relative relationship can also stand for cross channels which will produce more relative relationship to increase the discriminability of the descriptor. We define a relative relationship pair on R by using the invariant cross color channels as follows: ⎧1, if piC, lm < pCj ,nl , t ( piC, lm , pCj ,nl ) = ⎨ ⎩0, otherwise

(a) Bikes

n

1

2612

http://www.robots.ox.ac.uk/~vgg/research/affine

Table 1. Computation Time Methods BRIEF Galaxy_G Galaxy_N Galaxy_C SURF SIFT

Time (s) 70.20 64.76 75.89 71.71 171.16 88.17

Graffiti dataset are then degraded. In the remaining cases, our descriptor with invariant cross color channels performs better results compared to our descriptor without invariant cross color channels and BRIEF. Although the feature dimension of our descriptor with the gray level is less than that of BRIEF, the results are still better than those of BRIEF in most cases. Table 1 shows the total computation time of the six datasets including descriptor construction time, and descriptor matching time. All of the methods were implemented in C++ using OpenCV on an Intel i7 3.4G computer. Although our method employs the invariant cross color channels, the computation time only slightly increases and is still much less than those of SIFT and SURF.

4. Conclusion We develop a new descriptor using the galaxy sampling pattern and the relative relationship between invariant cross color channels. In the future, we will apply the rotation invariant property to deal with viewpoint change and rotation cases.

5. Acknowledgement This research was supported by NSC100-2221-E-005085 and NSC101-2221-E-005-086-MY3 from the National Science Council, Taiwan.

References [1] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004. [2] C.-R. Huang, C.-S. Chen and P.-C. Chung. Contrast context histogram - An efficient discriminating local descriptor for object recognition and image matching. Pattern Recognition, 41(10):3071-3077, 2008. [3] T. Ojala, M. Pietikäinen and M. Heikkilä. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971-987, 2002. [4] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):16151630, 2005.

(a) Bikes

(b) Trees

(c) Graffiti

(d) Wall

(e) Leuven

(f) Ubc

Figure 3. Recall and 1-precision results. [5] I. T. Joliffe. Principal Component Analysis. SpringerVerlag, 1986. [6] Y. Ke and R. Sukthankar. PCA-SIFT: A more distinctive representation for local image descriptors. In IEEE Conference on Computer Vision and Pattern Recognition, pages 506-513, 2004. [7] C. Silpa-Anan, R. Hartley. Optimised KD-trees for fast image descriptor matching. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8, 2008. [8] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: An efficient alternative to SIFT or SURF. In International Conference on Computer Vision, pages 2564-2571, 2011. [9] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Surf: Speeded Up Robust Features. Computer Vision and Image Understanding, 10: 346-359, 2008. [10] R. Zabih, and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In European Conference on Computer Vision, pages 151158, 1994. [11] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary robust independent elementary features. In European Conference on Computer Vision, pages 778792, 2010. [12] K.E.A. van de Sande, T. Gevers, and C.G.M. Snoek. Evaluating color descriptors for object and scene recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 32(9):1582-1596, 2010.

2613