Multiscale keypoint detection using the Dual-Tree Complex Wavelet Transform Julien Fauqueur, Nick Kingsbury, Ryan Anderson University of Cambridge, Signal Processing Group {jf330,ngk,raa37}@cam.ac.uk
We show that the Dual-Tree Complex Wavelet Transform (DTCWT) [1] is a well-suited basis to detect salient keypoints in images as it is: directionally selective, smoothly shift invariant, optimally decimated at coarse scales, invertible (no loss of information) and fast to compute. It is therefore more suitable than the Discrete Wavelet Transform for content analysis and especially for fast and accurate keypoint detection. The DTCWT: The DTCWT decomposition of an n×n image results in a decimated dyadic decomposition into i=1..m scales, where each scale is of dimension n/2i× n/2i. At each decimated location of each scale, we have a set S of 6 complex coefficients, denoted as S={ρ1eiθ1, ..., ρ6eiθ6}, corresponding to responses to the 6 subband orientations, namely: 15°, 45°, 75°, 105°, 135°, 165°. Determining the keypoint energies: The types of keypoint features we are interested in (blob, corner, junction) create energies in non-adjacent subbands, as distinct from edges which create energies in adjacent subbands only. We propose the following energy measure to detect the presence of such keypoints: E(S) = ρ1ρ3+ρ1ρ4+ρ1ρ5+ρ2ρ4+ρ2ρ5+ρ2ρ6+ρ3ρ5+ρ3ρ6+ρ4ρ6. Note that all products within E(S) are between non-adjacent subband magnitudes ρi. Unlike Difference-of-Gaussians detectors (as in SIFT [2]), which rely on isotropic filtering, the directional filtering involved in the DTCWT allows us to directly detect keypoints and not edges. We produce m decimated energy maps M1, ..., Mm by calculating E(S) for all the coefficients of the DTCWT decomposition. The multiscale energy map: Detecting the maxima directly in each decimated energy map would lead to a poor keypoint localisation in space and scale. Instead, we upsample them using a Gaussian kernel interpolation to resize them to the original image size and accumulate them into m-1 new energy maps A1, ..., Am-1 as follows: Ai = upsample (Mm) + ... + upsample(Mi) for all i=1,...,m-1. Each map Ai is then the summation of all upsampled energy maps from the current and coarser scales. An illustration of such a multiscale energy map for the test image (figure 1) is shown in figure 2. Maxima detection: In each map Ai, keypoints are detected by selecting locations where energies are maximal on a 3×3 neighbourhood. Results: Figure 1 shows a test image which contains corners and blobs, with different sizes and blur factors. Detected keypoints are shown with (coloured) circles in figure 3, in which the circles are centred on the keypoint location and their radius is proportional to the keypoint scale. All the different features are detected up to the scale that matches their size with a precise localisation, while ignoring edge features. Blobs tend to produce concentric circles across scales, while corners produce circles which are slightly shifted along the bisector of the corner. Figure 4 shows another example of detected keypoints on a 512×512 natural aerial image. Perceptually consistent local saliencies are picked up by the keypoints, such as small cars at fine scale and building corners at different scales. We also performed tests with various degrees of noise up to a point where it becomes hard to distinguish between fine scale saliencies and noise. Noise starts to appear at the finest scales. Since maps {Ai} are sums of coarser scale maps (scales i to m-1), the maxima detection is less sensitive to noise than if the original energy maps interp(Mi) were used. Furthermore, in case of very noisy images, scale 1 (the finest scale) can be ignored by considering keypoints detected only in maps A2 to Am-1. Our code runs in Matlab on a 3GHz PC. On a 512×512 image, the multiscale keypoint detection takes 2.9s. For comparison, the SIFT [2] program (C++ code, version 4, downloaded from David Lowe’s homepage) takes 4.5s for detection and description of keypoints. The speed of our code is due to the computational efficiency of the DTCWT and the simplicity of our detection scheme.
Conclusion: We presented a novel way to perform fast and accurate multiscale keypoint detection using the DTCWT. Its directional selectivity makes it suitable to distinguish keypoints from edges, unlike Difference of Gaussian detectors. We showed that accurate detection can be obtained from this decimated decomposition with the accumulated keypoint energy maps. Our first results show that the detected keypoints are perceptually consistent with the visual content of the image. This preliminary work is promising and we are now investigating means to characterise these keypoints in a multiscale manner, which should be possible without much extra processing. Reference: 1. N.G. Kingsbury, “Complex wavelets for shift invariant analysis and filtering of signals,” Journal of Applied and Computational Harmonic Analysis, vol. 10, no. 3, pp. 234–253, 2001. 2. David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
Figure 1 : 256×256 input test image with features of different size and nature.
Figure 2 : The A1 map of accumulated energies across scales 1 to 5. Its smoothness guarantees robust detection.
Figure 3 : Detected keypoints. The circle sizes are proportional to the keypoint scales. 49 keypoints are detected across 4 scales.
Figure 4 : Detected keypoints on a 512×512 natural image. 888 keypoints are detected across 5 scales.