Texture Retrieval by Gabor Filters Using Tuned Parameters - CiteSeerX

Report 2 Downloads 73 Views
Effects of Different Gabor Filter Parameters on Image Retrieval by Texture Lianping Chen, Guojun Lu, Dengsheng Zhang Gippsland School of Computing and Information Technology Monash University Churchill, Victoria, 3842, Australia Email: {Lianping.Chen, Guojun.Lu, Dengsheng.Zhang}@infotech.monash.edu.au Abstract Gabor filter is widely used to extract texture features from images for image retrieval. A number of parameters (number of scales and orientations and filter mask size) are used in the Gabor Filter. In the reported work so far, these parameters seem to be chosen without proper explanations. In this paper, we investigate the effects of different Gabor filter parameters on texture retrieval.

and filter mask size to balance the effectiveness and efficiency of texture retrieval. The rest of the paper is organized as follows: Section 2 introduces the fundamental of Gabor filters, and Section 3 presents the experimental results and analysis. In Section 4, a summary is given.

2. Gabor filter 2.1 Fundamentals

1. Introduction Texture is an important feature of images. In recent years, the multichannel Gabor decomposition becomes very popular for texture analysis. Gabor filter resembles the characteristics of simple visual cortical cells [1, 2] and is widely used to extract texture features from images for either texture segmentation [3, 4, 5] or image retrieval [6, 7, 8], Among many others, the most successful results are reported by Ma & Manjunath [7, 8], which has shown that image retrieval using Gabor features outperforms that using pyramid-structured wavelet transform (PWT) features, tree-structured wavelet transform (TWT) features and multiresolution simultaneous autoregressive model (MR-SAR) features. Their contribution is also adopted by MPEG-7 as one of texture descriptors [9]. A number of parameters are used in the Gabor Filter. However, in the reported work so far, these parameters seem to be chosen without proper explanations. Perona [10] just lists the number of scales and orientations used in a variety of systems. Numbers run from four to eleven scales and from two to eighteen orientations. Moreover, there is no research found on how to select filter mask size so far. In this paper, we investigate the effects of different Gabor filter parameters on texture retrieval. In practice, it is a compromise to choose number of filters

Gabor filters are a group of wavelets. A set of filtered images is obtained by convolving the given image with Gabor filters. Each of these images represents the image information at a certain scale and at a certain orientation. From each filtered image, Gabor features can be calculated and used to retrieve images. For a given image I(x, y) with size P×Q, its discrete Gabor wavelet transform is given by a convolution: Gmn (x, y) =

∑∑ I ( x − s, y − t )ψ s

* mn

( s, t )

t

where, s and t are the filter mask size variables, and * ψ mn is the complex conjugate of ψmn which is a class of

self-similar functions generated from dilation and rotation of the following mother wavelet: ψ(x, y) =

1 x2 y2 exp[− ( 2 + 2 )] ⋅ exp( j 2πWx) 2πσ xσ y 2 σx σy 1

where W is called the modulation frequency. ψ(x, y) is a Gaussian modulated by a complex sinusoid [5]. The selfsimilar Gabor wavelets are obtained through the generating function: ψmn(x, y) =

a − mψ ( ~ x, ~ y)

where m and n specify the scale and orientation of the wavelet respectively, with m = 0, 1, …M-1, n = 0, 1, … N-1, and M is the number of scales, N is the number of orientations.

~ x = a − m ( x cosθ + y sin θ ) ~ y = a − m (− x sin θ + y cosθ )

1

a = (Uh /Ul) M −1 , Wm,n = a mUl (a + 1) 2 ln 2 , 2π a m ( a − 1)U l

1

σ y ,m ,n = 2π tan(

π

)

2N

U h2 1 −( )2 2 ln 2 2πσ x ,m ,n

where a is a scale factor. Uh represents the highest centre frequency and Ul is the lowest centre frequency of interest. The values of σx and σy characterize the spatial extent and bandwidth of the filter in x and y directions respectively.

2.2 Texture representation After applying Gabor filters on the image with different orientation at different scale, we obtain an array of magnitudes: E(m, n) =

∑∑ | G x

mn

We follow [6] for rotation normalization and similarity measurement.

2.3 Parameters

where a >1 and θ = nπ/N. The variables in the above equations are defined as follows:

σ x,m,n =

f = (µ00 , σ00 , µ01 , σ01 , …, µ(M-1)(N-1), σ(M-1)(N-1)).

( x, y ) | ,

In the above equations, there are following parameters to be selected. M and N are the numbers of scales and orientations respectively. The filter mask dimension size is s*t. The filter mask size needs not to be square, but this is usually the case [11]. If the filter is centred on a pixel, it must be odd dimensions. We shall assume this to be the case in our discussion. Filters with Ul, the lowest centre frequency, and Uh , the highest centre frequency, are centred in the frequency domain at distances Ul and Uh from the origin, respectively. The upper limit frequency for Uh is 0.5 [12], and the low limit one for Ul is 0. In reality, it is very rare to have this kind of images with only maximum or minimum frequency. Moreover, as far as the cost of computation and storage space is concerned, different centre frequency to be selected does not make any difference. We follow [6, 8] to choose Uh = 0.4 and Ul = 0.05. In this work, we just focus on three parameters: number of scales, number of orientations and filter mask size to determine the effects of different Gabor filter parameters on texture retrieval.

3. Experimental results 3.1 Setup

y

m = 0, 1, …, M-1; n = 0, 1, …, N-1

These magnitudes represent the energy content at different scale and orientation of the image. The main purpose of texture-based retrieval is to find images or regions with similar texture. The following mean µmn and standard deviation σmn of the magnitude of the transformed coefficients are used to represent the texture feature of the region:

µ mn =

E ( m, n ) P×Q

σ mn =

∑ ∑ (| G x

All images used for this experiment are from Brodatz [13]. In Brodatz, there are 112 512*512 Brodatz texture images, each is cut into 16 128*128 sub textures to create a database composed of 1792 textures, plus some deliberately rotated textures with a total of 1852 textures. Every texture is used as a query, and the average precision-recall is used as an overall performance measurement.

3.2 System structure mn

( x, y ) | − µ mn )

2

y

P×Q

A feature vector f (texture representation) is created using µmn and σmn as the feature components [6, 8]. M scales and N orientations are used and the feature vector is given by:

The complete system structure used in our test is outlined in Figure.1. It consists of following steps: parameters selection, Gabor filters and Gabor features generation, rotation normalization, and image retrieval.

Gabor filter

For every scale and orientation

Gabor feature

Feature Vector (scale*orientation*2)

convolving the filter with the image

µmn σmn

Rotation normalization (circular shift)

Image Retrieval

Figure 1. System structure used in our test

3.3 Performance for different scales and orientations Gabor filter is a frequency and orientation selective Gaussian envelope. The set of scale channels can be configured to capture a specific band of frequency components from an image. The set of the orientational channels are used to extract directional features. The number of multichannels or called filters is the product of number of scales and number of orientations. We change scales and orientations while filter mask size is kept unchanged. In Figure 2, the curve for brof13s6o4 represents filter mask size of 13*13, scale of 6 and orientation of 4 (the number of filters is 24), for Brodatz image database. The first observation is that the performance of the retrieval, even with the same number of filters, is affected by different combinations of scales and orientations. The experimental results also suggest that the bigger number of scales and orientations, or more precisely, the more filters, doesn’t always mean to have a better performance, as shown in Figure 2. Brof13s6o16 has almost the same performance as brof13s6o4 but more computationally expensive. In the meantime, Figure 2 points out, that the small number of filters does not give us a better performance either. Brof13s6o4 has a higher performance than brof13s4o3. Therefore, the best performance was achieved by having the parameter settings of 6 scales and 4 orientations for filter mask size 13*13. The possible reasons and analysis are summarized as below: Daugman [14] showed that for two-dimensional Gabor functions, the uncertainty relations ∆x∆u≥1/(4*π) and ∆y∆v≥1/(4*π) limit the joint resolution in the 2D space and the 2D frequency domains, where [∆x, ∆y] gives the resolution in space domain and [∆u, ∆v] gives the resolution in frequency domain. Therefore, raising the

resolution in space domain will lead to diminishing the resolution in frequency domain. Thus such filters can negotiate the inescapable trade-offs for resolution in different ways. For example, sharp spatial resolution in the y direction (at the expense of orientation selectivity) or sharp spatial resolution in the x direction (at the expense of spatial-frequency selectivity or scale selectivity). Furthermore, we can also design a filter for greater resolution in the spatial domain by reducing the standard deviation of the 2D Gaussian envelope along both spatial dimensions. However, decreasing the effective spatial area of the filter has the inevitable result of increasing its effective area in the frequency domain, thereby decreasing its spatial frequency (e.g. scale) and orientation selectivity. Such a division of labor among filters permits the extraction of different spatial-spectral information from the image. Therefore, the scale and orientation selectivity has to be considered simultaneously. This has been discussed in detail in [14]. The number of filters has to be reasonable. Besides, the more filters we have, the more detailed and redundant representation of the image we get. But this may not result in better retrieval performance because similar features may now be captured by different filters. In contrast, fewer filters cannot give us enough detailed representation of the images. Thus, the number of filters should be neither too big nor too small. The same observations can be made from Figures 3-5 that the combination of scale 6 and orientation 4 is the best choice for filter mask size 9*9, 33*33 and 61*61 respectively, considering both the retrieval effectiveness and the computational cost. 1

0.9

0.8

0.7

0.6 PRECISION

Parameters selection

0.5

brof13s4o3 brof13s8o3 brof13s6o4 brof13s6o16

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

Figure 2. Average precision-recall with filter mask size 13*13, number of filters 12, 24 and 96 in combinations of 4*3, 8*3, 6*4 and 6*16 respectively.

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 PRECISION

PRECISION

1

0.5

brof09s4o3 brof09s8o3 brof09s6o4 brof09s6o16

0.4

0.3

brof61s4o3 brof61s8o3 brof61s6o4 brof61s6o16

0.4

0.3

0.2

0.2

0.1

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

Figure 3. Average precision-recall with filter mask size 9*9, number of filters 12, 24 and 96 in combinations of 4*3, 8*3, 6*4 and 6*16 respectively.

Figure 5. Average precision-recall with filter mask size 61*61, number of filters 12, 24 and 96 in combinations of 4*3, 8*3, 6*4 and 6*16 respectively. Actually, a series of experiments has been carried out to test the performance for different combinations of the parameters. The results also coincide with the aforementioned.

1

0.9

3.4 Performance for different filter mask size

0.8

0.7

0.6 PRECISION

0.5

0.5

brof33s4o3 brof33s8o3 brof33s6o4 brof33s6o16

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

Figure 4. Average precision-recall with filter mask size 33*33, number of filters 12, 24 and 96 in combinations of 4*3, 8*3, 6*4 and 6*16 respectively.

The experimental results with filter mask size changing only are presented in Figure 6. The best performance was achieved by setting the filter mask size 13*13 whereas the worst performance was the one for filter size 61*61. This can be explained as follows: In convolution, the calculation performed at a pixel is a weighted sum of grey levels from a neighbourhood surrounding a pixel. Grey levels taken from the neighbourhood are weighted by coefficients that come from a matrix or convolution kernel. In our case, the coefficients of the Gabor filter are the convolution kernel. Therefore, the kernel’s dimension or the filter mask size defines the size of the neighbourhood in which calculations take place. As filter mask size increases, the computed value of the convolution at a point is determined by a larger neighbourhood of image pixels. So if the neighbourhood is too large, the convolution value at a point is determined by this larger neighbourhood possibly with some unrelated image pixels, especially for non-homogeneous patterns. The retrieval cannot be accurate. Likewise, if the neighbourhood is too small and perhaps some related image pixels are missing. As a result, the effective

1

1

0.9

0.8

0.7

0.6 PRECISION

retrieval cannot be achieved either. Therefore, the filter size should not be too large or too small. Figure 6 suggests that brof09s6o4 performs nearly the same as brof13s6o4 does, but the former decreases greatly when the recall comes to 0.8. The filter mask size 13*13 is the best selection for our database. The filter mask size also influences the computational cost.

0.5

brof13s6o4 brof81s5o6

0.4

0.3

0.9

0.2

0.8 0.1

0.7 0 0.1

0.2

PRECISION

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

0.6

0.5

Figure 7. Average precision-recall on Brodatz Images with our optimal parameters filter size 13*13, scale 6 and orientation 4 and parameters filter size 81*81, scale 5 and orientation 6 selected in [9]

brof61s6o4 brof33s6o4 brof13s6o4 brof09s6o4

0.4

0.3

3.6 Consideration of computation and storage space requirements

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

Figure 6. Average precision-recall on Brodatz Images with filter mask size 61*61, 33*33, 13*13, 9*9, number of filters 24 in combination of scale 6, orientation 4

3.5 Overall effects of different scales and orientations and filter mask size Based on the discussion in Section 3.3 and 3.4, the best combination of parameters is scale 6, orientation 4, with 24 filters, and filter mask size 13*13. Figure 7 shows the result using our selected parameters and the ones (filter size 81*81, scale 5 and orientation 6) used in [9] for Brodatz images. It is evident from the presented results that none of the combinations thus far are better than the one we selected.

In order to get the Gabor features, images must be convolved with all the filters. More importantly, the convolution takes a lot of time because of the large image size and the computational complexity. To achieve fast speed in large images, convolution implementations in spatial domain can be substituted by the multiplications in frequency domain. The general process of the convolution requires applying FFT (Fast Fourier Transform) and IFFT (Inverse Fast Fourier Transform) on the source image and Gabor filters [15,16]. We know that the time complexity of the FFT and IFFT is O(S2log2S), where S *S is the image size. For each filter, the bigger filter mask size (set of the filter coefficients), the more time complexity needed to get it. Furthermore, the more Gabor Filters, the more complexity and storage space needed to compute and store the coefficients and the feature vectors. So the parameters selection is important. Our findings are reasonable regarding the computation and storage cost.

4. Summary The success of an effective and efficient texture image retrieval using Gabor filters depends essentially on the proper choice of • A suitable filter mask size, and • An appropriate number of filters with proper combination of number of scales and orientations

Especially for filter mask size, very little work has been done so far. But it really affects the performance and the computational cost substantially when different values are used. However, the selection of the filter parameters heavily depends on the characteristics of the textures in the database. Our test data are based on Brodatz textures. The best combination of parameters, filter mask size 13*13, scale 6 and orientation 4 (24 filters), might not work very well on every individual image, but they give better results when all images in the database are considered. Since most research on texture is conducted on the Brodatz texture collection by researchers, we believe that our findings are also representative, and furthermore, very useful for further texture analysis, such as texture segmentation etc.

References [1] Daugman, J.G., “Two-dimensional spectral analysis of cortical receptive field profiles,” Vision Research, Vol. 20, pp. 847-856, 1980. [2] S. Marcelja, “Mathematical description of the responses of simple cortical cells,” J. Opt. Soc. Amer., vol. 70, no. 11, pp. 1297-1300, 1980. [3] A. K. Jain and F. Farrokhnia, “Unsupervised texture segmentation using Gabor filters,” Pattern Recognition, Vol. 24 no. 12, pp. 1167-1186, 1991. [4] Dunn, D. and Higgins, W.E., “Optimal Gabor filters for Texture Segmentation,” IEEE Transactions on Image Processing, Vol. 4, No. 7, pp. 947-964, Jul 1995. [5] Dunn, D., Higgins, W.E. and Wakeley, J., “Texture segmentation using 2-D Gabor elementary functions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 2, pp. 130-149, Feb. 1994.

[6] Dengsheng Zhang, Guojun Lu, “Content-based image retrieval using Gabor texture features”, In Proc. Of First IEEE Pacific- rim Conference on Multimedia (PCM”00), pp.1-9, Fargo, ND, USA, June 1-3, 2001. [7] P. Wu, B.S.Manjunath, S.D. Newsam and H.D.Shin, "A Texture Descriptor for Image Retrieval and Browsing", Computer Vision and Pattern Recognition Workshop, Fort Collins, CO, USA, June 1999. [8] B.S.Manjunath and W.Y. Ma, "Texture features for browsing and retrieval of image data", IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.18, no.8, pp.837-42, Aug 1996. [9] B.S. Manjunath, Phillipe Salembier, Thomas Sikora, Introduction to MPEG-7 : multimedia content description interface, Chichester; Milton (Qld.) : Wiley, 2002. [10] Perona, P., “Deformable kernels for early vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 17, Issue: 5, pp. 488-499, May 1995. [11] Nick Efford, Digital image processing : a practical introduction using Java, Harlow New York : Addison-Wesley, 2000. [12] Yossi Rubner and Carlo Tomasi, Perceptual metrics for image database navigation, Boston, Mass.; London: Kluwer Academic, c2001. [13] P. Brodatz, Textures: A Photographic Album for Artists and Designers. New York: Dover, 1966. [14] Daugman, J.G., “Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by twodimensional visual cortical filters,” Journal of the Optical Society of America, Vol. 2, pp. 1160-1169, 1985. [15] Tan, T.N. and A.G. Constantinides (1990). “Texture analysis based on a human visual model,” Proc. ICASSP90, pp. 2137-2140. [16] R. C. Gonzalez, R. E. Woods, Digital image processing, 2nd edition, Upper Saddle River, N.J. Prentice Hall, c2002.