ROTATION INVARIANT CURVELET FEATURES FOR TEXTURE IMAGE RETRIEVAL Md Monirul Islam, Dengsheng Zhang, and Guojun Lu Gippsland School of Information Technology, Monash University, VIC 3842, Australia E-mail: {md.monirul.islam, dengsheng.zhang, guojun.lu}@infotech.monash.edu.au ABSTRACT Effective texture feature is an essential component in any content based image retrieval system. In the past, spectral features, like Gabor and wavelet, have shown superior retrieval performance than many other statistical and structural based features. Recent researches on multi-resolution analysis have found that curvelet captures texture properties, like curves, lines, and edges, more accurately than Gabor filters. However, the texture feature extracted using curvelet transform is not rotation invariant. This can degrade its retrieval performance significantly, especially in cases where there are many similar images with different orientations. This paper analyses the curvelet transform and derives a useful approach to extract rotation invariant curvelet features. Experimental results show that the new rotation invariant curvelet feature outperforms the curvelet feature without rotation invariance. Index Terms— Curvelet transform, CBIR
1. INTRODUCTION Texture is one of the most important features used in content based image retrieval (CBIR) [1, 2]. A significant number of techniques have been proposed in the literature to extract texture features which can be broadly divided into spatial and spectral techniques. Spatial techniques measure image-textures using low order statistics from image grey levels and are subject to noise. Furthermore, spatial features are not robust and the number of useful features is very small. So far, spectral features, like Gabor [3] and wavelet [4], have shown to have better retrieval performance than the features calculated using spatial methods, like statistical and structural techniques. Recently, researches on multi-resolution analysis show that curvelet transform has significant advantages over Gabor transform due to curvelet is more effective in capturing curvilinear properties, like lines and edges [5, 6]. Curvelet transform was originally proposed for image-de-noising application [6] and has shown promising results in character recognition [7] and image retrieval [8]. Recently, Sumana et al. [9] show that curvelet transform significantly outperforms the widely used Gabor transform in the standard Brodatz texture database. However, the curvelet features extracted in this work are not rotation invariant. Therefore, these features will not be able to retrieve images with different orientations. For image retrieval application, this means expensive online shifting of the features vectors in all directions to find the best match between the query and example image. Instead of doing shift matching, the feature vector can be normalized before indexing, so as the online matching is simple [10].
In this paper, we propose an effective and efficient technique to extract rotation invariant curvelet feature. We analyse the features and proposes a novel technique to extract rotation invariant curvelet features. The method normalizes each curvelet descriptor to avoid expensive online matching. We show that the rotation invariant curvelet feature has better performance than the rotation variant curvelet feature in retrieving both man-made and natural textures. The rest of this paper is organized as follows. In section 2, we briefly introduce the curvelet transform, while section 3 describes the technique of rotation invariant curvelet feature extraction. The experimental results and comparison are presented in section 4. Section 5 concludes the paper.
2. THE CURVELET TRANSFORM AND FEATURE EXTRACTION This section briefly describes curvelet transform and texture feature extraction using curvelet transform.
2.1. Curvelet transform The concept of curvelet transform has been extended from the twodimensional ridgelet transform. Therefore, the continuous 2D ridgelet transform is defined at first. Given an image f(x, y), its continuous ridgelet transform at scale a, translation b, and orientation θ, is defined as, CRT f (a, b, θ ) = ∫∫ψ a , b,θ ( x, y ) f ( x, y )dxdy (1) where, 2D ridgelet function, ψa,b,θ(x,y) is generated from a univariate function, ψ(x) which has vanishing mean and sufficient decay. The ridgelet, ψa,b,θ(x,y) is given as, (2) ψ a, b,θ ( x, y ) = a −1 2ψ (( x cos θ + y sin θ − b) / a)
(a)
(c) (b) Fig. 1. Visualization of waveforms of (a) ridgelet, (b) wavelet, and (c) Gabor functions. A ridgelet is a wavelet type function and is constant along the lines, xcosθ +ysinθ = const. Fig. 1(a) shows a typical ridgelet [5]. A ridgelet is much sharper than a sinusoidal wavelet. Fig. 1(b) shows the 3D view of a wavelet function. In contrast to wavelet which is efficient in detecting points, a ridgelet is efficient in
detecting lines. The sharp peak in Fig 1(b) indicates that wavelet can localize the point singularity whereas the sharp edge in Fig. 1(a) means that ridgelet can localize the line singularity. A ridgelet is also mathematically compared with wavelet and it has been shown point parameters of a wavelet are replaced by the line parameters of a ridgelet [5]. This similarity means that like Gabor, a ridgelet can be tuned at different scales and orientation to create curvelets. However, unlike Gabor which has oval shape in its waveform as shown in Fig. 1(c), curvelets are linear in edge direction (Fig. 1(a)). Therefore, a ridgelet can capture lines and edges more accurately than Gabor. Furthermore, because of the oval shapes of Gabor filters, the frequency spectrum covered by a set of Gabor filters is not complete. In contrast to Gabor, curvelet covers the entire frequency spectrum. Fig. 2(a) shows that there many holes between the ovals in the frequency plan of the Gabor filters [3]. Fig. 2(b) shows the frequency tiling by curvelet transform with 4 scale decomposition [11]. In Fig. 2(b), si means scale i, and 1, 2, 3, etc. are the subband or orientation numbers. Fig. 2(b) clearly shows that curvelet covers the entire frequency spectrum. v
1 1 2 3 4567 8 1 23 4 5 6 1 s1 7 s2 8
Ul
u
s3
Uh
2n
D 2 = ∑ (Qi − Ti ) 2
(6)
3. ROTATION INVARIANT CURVELET FEATURE
Fig. 2. Frequency Spectrum coverage by (a) Gabor and (b) curvelet.
2.2. Curvelet feature extraction Given a digital image f[m, n] of dimension M by N, the digital transform, CTD (a, b, θ) is obtained as, (3) CT D ( a , b , θ ) = f [ m , n ]ψ D [m, n ] ∑ ∑ a , b ,θ
Equation (3) is implemented in frequency domain and can be expressed as,
CT D (a, b,θ ) = IFFT ( FFT ( f [m, n]) × FFT (ψ aD,b,θ [m, n]))
where, µ sibj and σsibj are the mean and standard deviation calculated from the subband bj at scale si. Each image in the database is represented and indexed using this feature vector. During retrieval, an image is given as a query. The feature vector of the query image is compared with the feature vectors of all images in the database using L2 distance measure. The distance, D, between a query feature vector, Q, and a target feature vector, T, is given by
Finally, database images are ranked based on the distance measures and displayed to the users.
(b)
0≤ m< M 0≤ n< N
f = {µs1b1, σs1b1, µs2b1, σs2b1, µs2b2, σs2b2, ..., µs2b8, σs2b8, µs3b1, σs3b1, µs3b2, σs3b2, ..., µs3b16, σs3b16, µs4b1, σs4b1} (5)
i =1
9 10 11 12 13 14 15 16
s4
(a)
are used. Therefore, for 4 levels decomposition, total 26(=1+8+16+1) subbands of curvelet coefficients are used. Thus a feature vector of 52 (=2X26) dimensions is created for each image. To make the features of different images compatible to each other, feature elements from different subbands are organized in the same order in the feature vector, as shown in Fig. 2. According to Fig. 2, the feature vector, f, for the 0° oriented images of Fig. 3(a) is organized as,
The curvelet features described in the previous section are not rotation invariant. The feature vector changes significantly when an image is rotated. Consider the three images in Fig. 3. Though these images have different orientations, their textures are similar. However, Fig. 4(a) shows that their feature vectors are quite different. Fig. 4(a) shows some portion of the curvelet feature values of the three images for a 4 level analysis. The mean energies of different subbands are shown in the figure. The maximum and the second maximum mean energies, which identify the dominant directions of these images, appear in the different positions in the vectors. Therefore, the feature distances between these images will be large. Eventually, these images will not be treated as similar during retrieval though they are actually similar. Thus, rotation invariant curvelet features are needed so that similar images with different orientations should have similar feature vectors.
(4)
A detail description for the implementation of Equation (4) can be found in [10]. After obtaining the coefficients in CTD (a, b, θ), the mean and standard deviation are calculated from each set of curvelet coefficients. Therefore, if n curvelets are used, a feature vector of dimension 2n is used to represent an image. This feature extraction is applied to each database image. Each image is decomposed into 4 or 5 levels of scales using curvelet transform. The numbers of subbands at different scales are different. For 4 levels of decomposition, there are 1, 16, 32, and 1 subbands at decomposition level 1, 2, 3, and 4, respectively. Therefore, 4 levels decomposition creates 50(=1+16+32+1) subbands of curvelet coefficients. However, because a curvelet oriented at an angle, θ produces the same coefficients as a curvelet oriented at an angle, π+θ, only half of the subbands at level 2 and 3
(c) (a) (b) Fig. 3. Three similar images with orientation (a) 0°, (b) 30°, and (c) 60°. In the following, we propose an efficient technique to solve the rotation variant issue. The idea is to rearrange the feature values based on their dominant orientation. The feature elements which show the dominant direction are kept at the first position in the feature vector and the other elements are shifted circularly relative to the maximum element. This is done for each scale
separately because the numbers of subbands at different scales are different. By analysing the energy distribution among the feature elements, it is found in Fig. 4(a) that the maximum and the second maximum elements always appear together. This is because the energy of the dominant orientation of an image usually spreads between two neighboring subbands. Therefore, during reorganizing the feature elements, these two maximum feature elements are kept together in the reorganized feature vector preserving their original relative order. For example, consider the feature elements of the 0° oriented image of Fig. 3(a) at scale 2. The maximum and the second maximum mean energies are found at subbands 3 and 2, respectively. Therefore, the mean energies at scale 2 are rearranged as, {µs2b2, µs2b3, . . ., µs2b8, µs2b1} {µs2b1, µs2b2, . . ., µs2b7, µs2b8} s3b16 s3b15 s3b14 s3b13 s3b12 s3b11 s3b10 s3b9 s3b8 s3b7 s3b6 s3b5 s3b4 s3b3 s3b2 s3b1 s2b8 s2b7 s2b5 s2b4 s2b3 s2b3 s2b2 s2b1
new arrangement of means previous arrangement of means
Image of Fig. 3(c) Image of Fig. 3(b) Image of Fig. 3(a)
4. EXPERIMENTAL RESULTS This section compares the texture retrieval performance of curvelet features with and without rotation invariance. In this experiment, we use the widely used Brodatz texture database which consists of 112 images of size 640 by 640 pixels. As the performance of the rotation invariant feature is tested, a database is needed which consists of sufficient number of similar images rotated at different orientations. Therefore, each of original 112 images is rotated to 0◦, 30◦, 60◦, 90◦, …, and 330◦. Each rotated image is then cut into a number of sub images of size 128*128. Thus each original image produces 188 sub images oriented in different angles and this image is regarded as the ground truth for all of its sub images. In total, 21056 128*128 images are created from 112 original Brodatz texture images. This generated database is used in the experiment. We apply both rotation variant and invariant curvelet feature extraction process for each database image. Therefore, each image is represented and indexed by the two sets of features. The conventional precision-recall curve is used to evaluate the retrieval performance. Precision is the ratio of the number of relevant images retrieved and the total number of retrieved images. Recall is calculated as the ratio of the number of relevant images retrieved and the number of total relevant images. As the ground truth of the database images are known, each of them is used as a query. From each query, precisions in percentages are measured at 10 levels of recall percentages. The average precisions are calculated from all of 21,056 queries at each recall level. Fig. 5 shows the comparison between the retrieval performance of rotation variant and invariant curvelet features. 90
Precision
80 70
0
2 4 mean energy
6
0
2 4 mean energy
6
(a) (b) Fig. 4. Energy distribution of different images at different subbands. (a) Before rotation invariance (b) After rotation invariance. Note that the first two maximum means of scale 2 appear together at the first two positions in the new organization and µ s2b2 appears before µ s2b3 to maintain their original order. The mean energies at other scales are also reorganized in the similar way. As the mean energies determine the dominant orientation of an image, the standard deviations of different subbands are rearranged in the same order of the means. After everything is restructured, the final rotation invariant curvelet feature vector is given as, f = {µs1b1, σs1b1, µs2b2, σs2b2, ..., µs2b8, σs2b8, µs2b1, σs2b1, µs3b4, σs3b4, ..., µs3b16, σs3b16, µs3b1, σs3b1, ..., µs3b3, σs3b3, µs4b1, σs4b1} (7) Fig. 4(b) shows the rearranged energies of sub-bands of scale2 and 3. It is clear that the feature values of different images are more similar in Fig. 4(b) than in Fig. 4(a).
60
Rotation Variant Rotation Invariant
50 40 30 20 10 0 10 20 30 40 50 60 70 80 90 100 Recall
Fig. 5. Average retrieval performance of rotation invariant and variant curvelet features. Fig. 5 clearly shows that the curvelet feature with rotation normalization significantly outperforms the curvelet feature without rotation normalization. The reason is that rotation variant curvelet feature fails to identify the similar images with orientations different from the query, whereas rotation invariant curvelet feature can do it. This is verified in all the examples in Fig. 6. Fig. 6 shows few retrieval results with both rotation invariant (top snapshots) and rotation variant curvelet feature (bottom snapshots). In each case, the top left image is the query and first 30 retrieved images are shown. In Fig. 6(a), the query image is a manmade texture. It shows that all of the first 30 retrieved images by rotation invariant feature are similar but oriented in different directions. In contrast, rotation variant feature retrieves images oriented only in the same direction of the query and fails to retrieve
other similar images with different orientations. The difference between the performances of these two features is more significant in retrieving natural textures. Fig. 6(b-d) use natural texture images as queries which are difficult to retrieve. The results show that the rotation invariant feature gives significantly good retrieval results in Fig. 6(b-d), while the rotation variant feature retrieves only few similar images in all cases. In contrast to the rotation variant curvelet feature fails to retrieve natural texture images, the rotation curvelet invariant feature successfully retrieves them. All these examples undoubtedly prove that the rotation invariant curvelet feature has better retrieval performance than the rotation variant curvelet feature.
5. CONCLUSION Rotation invariance is one of the key issues for any texture descriptor. Texture features extracted from spectral transform are usually not rotation invariant due to scale and subband distribution. This paper has proposed an efficient and effective way of normalizing curvelet features to extract rotation invariant texture features. The method can be used for normalization of other multiresolution spectral features. It has two fold advantages. Firstly, it can avoid expensive online matching as used in MPEG. Secondly, the retrieval performance is significantly improved. The experimental results show that the rotation invariant feature has considerably outperformed the rotation variant feature, especially the performance of rotation invariant curvelet feature has been shown significantly promising in retrieving natural texture images. Therefore, this feature has very good potential in retrieval of real world images as they consist of natural textures. Currently, we are investigating the application of rotation invariant curvelet feature in region based retrieval and semantic learning of natural images. Scale invariance issue is still unsolved and will be addressed in our future research work.
(a) D47 (Woven brass) as the query
(b) D15 (Straw) as the query
6. REFERENCES [1] F. Long, et al., ‘Fundamentals of Content-based Image Retrieval,” in Multimedia Information Retrieval and Management, D. Feng Eds, Springer, 2003. [2] M. Tuceryan and A. K. Jain, “Texture Analysis,” in The Handbook of Patt. Recog. and Computer Vision, 2nd Ed., World Scientific Publishing Co., 1998. [3] B.S. Manjunath et al., “Introduction to MPEG-7”, John Wiley & Son Ltd., 2002. [4] S. Bhagavathy and K. Chhabra, “A Wavelet-based Image Retrieval System,” Technical Report—ECE278A, Vision Research Laboratory, University of California, Santa Barbara, 2007. [5] M. N. Do, “Directional Multiresolution Image Representations,” PhD Thesis, EPFL, 2001. [6] J. Starck, et al., “The Curvelet Transform for Image Denoising,” IEEE Trans. on Image Processing, 11(6), 670684, 2002. [7] A, Majumdar, “Bangla Basic Character Recognition Using Digital Curvelet Transform,” Journal of Patt. Recog. Research, 1: 17-26, 2007. [8] L. Ni and H. C. Leng, “Curvelet Transform and Its Application in Image Retrieval,” 3rd Int. Symp. on Multispectral Image Proc. and Patt. Recog., Proceedings of SPIE, vol. 5286, 2003. [9] I. J Sumana et al., “Content based image retrieval using curvelet transform,” Accepted to be appeared in the Proc. of Int. workshop on MMSP, Oct, 2008. [10] D. Zhang et al., “Content-based image retrieval using Gabor texture features,” In Proc. of First IEEE PCM, pp. 392-395, Sydney, Australia, Dec 2000. [11] E. Candes et al., “Fast Discrete Curvelet Transforms”, Multiscale Modeling and Simulation, 5(3), 861-899, 2006.
(c) D22 (Reptile skin) as the query
(d) D37 (water) as the query
Fig. 6. First 30 retrieved images by different queries. Top and bottom rows are the results from rotation invariant and rotation variant curvelet features. Images are organized from left to right and top to bottom in increased distances from the query.