A scale invariant interest point detector for discriminative blob detection. Luis Ferraz1 and Xavier Binefa2 1
2
Universitat Aut´ onoma de Barcelona, Department of Computing Science, Barcelona, Spain,
[email protected], Universitat Pompeu Fabra, Department of Information and Communication Technologies, Barcelona, Spain,
[email protected] Abstract. In this paper we present a novel scale invariant interest point detector of blobs which incorporates the idea of blob movement along the scales. This trajectory of the blobs through the scale space is shown to be valuable information in order to estimate the most stable locations and scales of the interest points. Our detector evaluates interest points in terms of their self trajectory along the scales and its evolution obtaining non-redundant and discriminant features. Moreover, in this paper we present a differential geometry view to understand how interest points can be detected. We propose to analyze the gaussian curvature to classify image regions as blobs, edges or corners. Our interest point detector has been compared with some of the most important scale invariant detectors on infrared (IR) images, outperforming their results in terms of: number of interest points detected and discrimination of the interest points.
Key words: discriminative features, blob detection, gaussian curvature, blob trajectory
1
Introduction
Interest point detection algorithms have been shown to be well suited for feature extraction. The main goal of these algorithms is to allow the extraction of features invariant to some viewing conditions. Scale invariant detectors estimate the location and the scale of these features. Different scale invariant detectors have been developed over the past few years and among the most important we can find Laplacian of Gaussian (LoG) [1], Derivative of Gaussian (DoG) [2], HarrisLaplace [3], Hessian-Laplace [3], salient regions [4], Maximally Stable Extremal Regions (MSER) [5] or Speeded-Up Robust Features (SURF) [6]. Typically, detectors are based on a multi-scale analysis of the image [7]. The space-scale can be built using different scale normalized operators, like Laplace filters or difference of Gaussians filters. For these detectors an interest point is detected if a local 3D extreme is present and if its absolute value is higher than a
2
Luis Ferraz and Xavier Binefa
threshold. Therefore, blobs at different scales are not related and the same blob can be detected many times along the scale-space. To avoid this problem, our proposal is to estimate the trajectory of blobs along scales and select the scale and location that best represent each blob. From a differential geometry point of view images can be understood as surfaces with 3 types of regions in function of their gaussian curvature: elliptical regions, parabolic regions and hyperbolic regions. These types of regions allow to see images in a simple way, where elliptical regions can be understood as blobs, parabolic regions as contours or plane regions and hyperbolic regions as corners or saddles. In order to extract this differential structure we use the full Hessian matrix [8] for each point. This approach outperforms Laplacian based operators more related to obtain rotational invariant information [9]. In this paper we compare LoG, DoG, Harris-Laplace and Hessian-Laplace.Each one construct it scale-space in a different way: – The LoG filters each scale with a scale adapted Laplacian filter. Blobs are the extremes in its 3D neighborhood (maxima for bright and minima for dark blobs). – The DoG is a computational cost optimization of LoG, it uses substraction of gaussians with different sigma to approximate the Laplacian filter. Lowe in [2] improve DoG to select blobs with a minimal gray level variation and with a circular shape. – The Harris-Laplace detector calculates corners at the different scales using a scale adapted Harris operator. After that, locations of detected corners are evaluated with a Laplacian filter in the superior and inferior scales. Interest points correspond to corners with a maximal response of Laplacian filter. – The Hessian-Laplace detector works in a similar way to Harris-Laplace detector. The main difference is that instead of Harris operator uses the determinant of the Hessian matrix to penalize very long structures. On the other hand, SURF is an interesting detector because of its low computational cost. However, SURF is quite similar to Hessian-Laplace in its outline and results, for this reason we do not compare this detector. Another interesting detector is MSER. It produces good results in comparison with other detectors but it is not analyzed in this paper because of its bad performance on blurred images [10]. Salient regions detector is not evaluated either because of its excessive computational cost. However, salient detector has been shown as a good discriminative feature extractor [11]. We have designed our detector in this sense. Gaussian curvature can be understood as a saliency measure given that it measures the normal curvatures around each point. On images, normal curvatures measure the variations of the gray level around blobs. This paper is organized as follows. In section 2 the method to detect interest points by means of curvature analysis is introduced. In section 3 our scale invariant interest point detector is described and finally, in Section 4 we present experimental results.
An interest point detector for discriminative blob detection.
2
3
Curvature analysis on images
The image behavior in a local neighborhood of a point, uniform and derivable, can be approximated by the Taylor formula. It means that they can be simplified in a expansion of functions. The Taylor expansion of a local neighborhood of a point x0 is, f (x~0 + ~h) = f (~x) = f (x~0 ) + ∇f (x~0 )~h +
1 ~T 2 h ∇ f (x~0 )~h + r(~h) 2!
(1)
where r(~h) is a residual. Differential geometry, in general, allow to extract properties and features of points by means of first and second order terms of the Taylor expansion. First order terms contain information about gradient distribution in a local neighborhood. Second order terms contain information about the local shape. Each image point is represented by three coordinates P = (x, y, f (x, y)). Thus, first order terms from Taylor expansion are the called Jacobian matrix, µ ∂P ¶ µ ¶ 1 0 fx ∂x J = ∂P = (2) 0 1 fy ∂y The first fundamental form I on an image f is defined as, µ ¶ 1 + fx2 fx fy I = J ∗ JT = fy fx 1 + fy2 )
(3)
~ associated to each point on an image f is defined as, The normal vector N −fx ∂P ∂P ~ = N × = −fy (4) ∂x ∂y 1 Second order terms from Taylor expansion conform the Hessian matrix H. H for images, is equivalent to the second fundamental form II. The general equation of II is derived from the normal curvature calculation [8] and it is ~ to each calculated using the second partial derivatives and the normal vector N neighborhood. Thus, the second fundamental form II for images is defined as, Ã
!
¢ ¡ ¢ µ¡ ¶ µ ¶ ~ ~ 0 0 fxx · N 0 0 fxy · N fxx fxy ¡ ¢ ¡ ¢ = =H ~ ~ fyx fyy 0 0 fxy · N 0 0 fyy · N (5) The first and second fundamental forms of a surface determine an important differential-geometric invariant, the Gaussian curvature K. The Gaussian curvature of a point on a surface is the product of their principal curvatures, K = k1 k2 . Gaussian curvature can be expressed as the ratio of the determinants of the second and first fundamental forms, II =
∂2P ~ ∂2P ~ ∂x2 N ∂x∂y N ∂2P ~ ∂2P ~ ∂x∂y N ∂ 2 y N
=
4
Luis Ferraz and Xavier Binefa
2 fxx fyy − fxy det(II) = (6) det(I) 1 + fx2 + fy2 The sign of K at a point determines the shape of the surface near that point [8]: for K > 0 the surface is locally convex (blob regions) and the point is called elliptic, while for K < 0 the surface is saddle shaped (i.e. corners) and the point is called hyperbolic. The points at which K is zero (i.e. contours) are called parabolic. The value of K apart from the type of region in function of the sign offer information about the saliency, this value on images is a measure of the variation of the gray level. An important property of K is that is not singular.
K = k1 k2 =
Fig. 1. Saddle surface with its normal curvatures k1 and k2 .
~ In Fig. 1 is shown the meaning of curvature. Given the normal vector N to the point x~0 and its two principal curvatures k1 and k2 , Gaussian curvature is defined positive if both curvatures have the same sign, negative if they have different sign and zero if any curvature is zero. I is positive definite, hence its determinant is positive everywhere. Therefore, the sign of K coincides with the sign of the determinant of II. Assuming that point x~0 is a critical point (the gradient ∇f (x~0 ) vanishes) the Gaussian curvature of the surface at x~0 is the determinant of II. So, in this case it is not necessary to calculate I to estimate the Gaussian curvature on x~0 . In the other cases, when the gradient ∇f (x~0 ) do not vanishes, can be shown that det(II) is close to Gaussian curvature. This is due to det(II) usually decreases while det(I) increases. In spite of this appreciation our method uses the complete equation (6) because of we focus on curvature to find the best locations. It is important to remark that to calculate K with (6) do not increase significantly the computational cost of the method.This is because Hessian matrix can be approximated from the first derivatives.
3
Discriminant blob detection using trajectories
In this section we propose a new scale invariant interest point detector called Trajectories of Discriminative Blobs (ToDB) based on the analysis of Gaussian
An interest point detector for discriminative blob detection.
5
curvature of the image along the space-scale representation. Moreover, to obtain more stable interest points the trajectory of each one is extracted.
Fig. 2. Trajectories of some blobs along scales (blue lines). Blob movements and blob fusions can be seen. Green points show all the extremes found. Red points are the extremes selected as interest points.
The evolution of blobs along scales was studied in depth by [12]. Traditionally, the analysis of the behavior of blobs presents severe complications, since it implied a detailed description of the image. However, for our purposes we do not need a precise description and one of the most important contributions of our work is to reduce the detail of the analysis since we only need an approximation of the movement of blobs. The outline of the algorithm is, 1. To build the scale-space representation with the Gaussian curvature K for pre-selected scales σn = σ0 ²n . Where ² is the scale factor between successive levels and σ0 is the initial scale. 2. At each level of the representation we extract the interest points by detecting the local maxima in the 8-neighborhood of a point x. 3. Relations between interest points at consecutive levels are extracted. For each interest point xij where j is the level of the representation. (a) Search in level j + 1 using a gradient ascent algorithm the most plausible relation r for the interest point i. xi in level j must be projected to level j + 1. (b) Save the existing relations R = {rik |exists GradientAscent(xij , xk,j+1 )} 4. By concatenating related relations R, pipes P are build. 5. Each pipe contains the Gaussian curvature of each point that conform it. 6. Finally, interest points are selected as the local maxima along pipes P .
4
Experimental Results
IR images are thermal images that contain a high signal to noise ratio and a lack of contrast, so blurred images are obtained. We have compared our method with
6
Luis Ferraz and Xavier Binefa
four typical interest point detectors that have proved, accordingly to literature, that produce good results in feature extraction tasks: LoG[1], DoG[2], HarrisLaplace[3] and Hessian-Laplace[3]. The experimental results are obtained using an IR database that contains 198 car images. Car images are distributed in 4 classes (4 car models) with 5 different views (rear, rear-lateral, lateral , frontal-lateral and frontal). In Fig. 3 is shown a subset of images of the database.
Fig. 3. Example of database images.
To perform a good comparison between the detectors, the same scales have been used to construct the scale-space. Specifically, in this example we have used 12 scales and a scale factor of 1.25. The mean size of car images is 236 x 143 pixels. To evaluate the detectors an entropy criterion has been used because of it measures the disorder of a local region, so, it is a good measure of discrimination. Thus, entropy allow to perform a non-typical analysis of interest points detected. In Table 1 is shown that our detector (ToDB) outperforms all the other detectors in terms of discrimination, even DoG. Specifically, our method outperforms in 186% the discrimination of the mean entropy value of all the feasible interest points that could be found on an image. Oddly, Hessian-Laplace and LoG obtain a smaller entropy than all the feasible interest points. An explanation could be the noise sensitivity of these detectors. ToDB avoid the noise sensitivity problem thanks to the trajectories. The trajectories, that force a continuity of each blob along the scales, make a selection of the stable blobs. Another interesting result, although expected, is that DoG find quite good interest points in front of LoG, Hessian-Laplace or Harris-Laplace. The main idea of DoG is obtaining discriminative regions having in account only the neighborhood information, in Table 1 we have shown that using a high level information as trajectories it is possible to improve its results(ToDB discrimination is a 42% higher than DoG discrimination).
An interest point detector for discriminative blob detection.
(a)
(b)
(d)
7
(c)
(e)
Fig. 4. Example of interest points found using different detectors. (a) Hessian-Laplace (b) Harris-Laplace (c) LoG (d) DoG (e) ToDB number of entropy per discrimination % outperform Detector interest points interest point taken ToDB as base (entropy/0.7932) All points+scale 694384 0.7932 54% 100% Hessian-Laplace 5077 0.6138 41% 77% Harris-Laplace 1025 0.9107 62% 115% LoG 2763 0.7559 51% 95% DoG 145 1.0410 70% 131% ToDB 118 1.4764 100% 186% Table 1. Discrimination index of different detectors.
LoG, Hessian-Laplace or Harris-Laplace detectors are more oriented to obtain lots of interest points leaving discrimination in a second term. A typical analysis for these detectors is the repeatability [3]. We have done a very simple comparison of the analyzed detectors on the boat images of [3]. The results show that LoG, Hessian-Laplace and Harris-Laplace obtain very high levels of repeatability (50%-70%) while DoG and ToDB obtain levels around of 35% 40%. As a final result, by comparing results shown in Figure 4 seem that ToDB obtains more perceptual interest points than the other ones, even DoG.
5
Conclusions
We have presented a powerful mechanism to detect the most stable and discriminative locations of blobs by estimating their trajectories along scales. By means of these trajectories the best locations and scales for each point can easily be selected. Moreover, by using the Gaussian curvature we classify regions on images in a simple way.
8
Luis Ferraz and Xavier Binefa
By comparing analyzed detectors with our ToDB detector we show that our algorithm outperforms widely all the other ones in function of its discrimination. ToDB opens future research lines around blob trajectories along scales and Gaussian curvature analysis using first and second fundamental forms. Moreover, an extension to the detection of affine blobs could be done by analyzing in depth the normal and geodesic curvature generated around each interest point. Finally, we want to remark that our detector has been tested mainly on IR images. However, tests done on gray level images have produced similar results.
Acknowledgements This work was produced thanks to the support of the Universitat Aut`onoma de Barcelona and the Centro de Investigacion y Desarrollo de la Armada (CIDA). Thanks are also due to Tecnobit S.L. for yielding the car sequences images.
References 1. Lindeberg, T.: Feature detection with automatic scale selection. International Journal on Computer Vision (IJCV) 30 (1998) 77–116 2. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision (IJCV) 60 (2004) 91–110 3. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. International Journal on Computer Vision (IJCV) 60 (2004) 63–86 4. Kadir, T., Brady, M.: Saliency, scale and image description. International Journal on Computer Vision (IJCV) V45(2) (November 2001) 83–105 5. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: In British Machine Vision Conference. (2002) 384–393 6. Bay, H., Ess, A., Tuytelaars, T., Gool, L.J.V.: Speeded-up robust features (surf). Computer Vision and Image Understanding 110(3) (2008) 346–359 7. Crowley, J.L.: A representation for visual information with application to machine vision. PhD thesis (1982) 8. DoCarmo, M.P.: Differential Geometry of Curves and Surfaces. Prentice-Hall (1976) 9. Lenz, R.: Group theoretical feature extraction: Weighted invariance and textulre analysis. In: Theory & Applications of Image Analysis: Selected Papers from the 7th Scandinavian Conference on Image Analysis. (1992) 63–70 10. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. International Journal on Computer Vision (IJCV) 65 (2005) 43–72 11. Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. In: European Conference on Computer Vision (ECCV). (2004) 228–241 12. Lindeberg, T.: Scale-Space Theory in Computer Vision (The International Series in Engineering and Computer Science). Springer (December 1993)