Texture Analysis Experiments with Meastex and Vistex Benchmarks1 S. Singh and M. Sharma PANN Research, Department of Computer Science, University of Exeter, Exeter, UK Abstract The analysis of texture in images is an important area of study. Image benchmarks such as Meastex and Vistex have been developed for researchers to compare their experiments on these texture benchmarks. In this paper we compare five different texture analysis methods on these benchmarks in terms of their recognition ability. Since these benchmarks are limited in terms of their content, we have divided each image into n images and performed our analysis on a larger data set. In this paper we investigate how well the following texture extraction methods perform: autocorrelation, cooccurrence matrices, edge frequency, Law’s, and primitive length. We aim to determine if some of these methods outperform others by a significant margin and whether by combining them into a single feature set will have a significant impact on the overall recognition performance. For our analysis we have used the linear and nearest neighbour classifiers.
1. Texture Benchmarks Performance evaluation of texture analysis algorithms is of fundamental importance in image analysis. The ability to rank algorithms based on how well they perform on recognising the surface properties of an image region is crucial to selecting optimal feature extraction methods. However, one must concede that a given texture analysis algorithm may have inherent strengths that are only evident when applied to a specific data set, i.e. no single algorithm is the best for all applications. This does not imply that benchmark evaluation studies are not useful. As synthetic benchmarks are generated to reflect naturally found textures, algorithm performances on these can be analysed to gain an understanding on where the algorithms are more likely to work better. For our study the objective is to compare texture algorithms from feature extraction perspective, and therefore the recognition rate of a classifier trained on these features is an appropriate measure of how well the texture algorithms perform.
1
Singh, S. and Sharma, M. Texture Analysis Experiments with Meastex and Vistex Benchmarks, Proc. International Conference on Advances in Pattern Recognition, Lecture Notes in Computer Science no. 2013, S. Singh, N. Murshed and W. Kropatsch (Eds.), Springer, Rio (11-14 March, 2001).
Texture benchmark evaluation is not a new area of work, however previous work has either compared too few algorithms or used very small number of benchmark images that makes it difficult to generalise results (see [15] for a criticism of various studies on performance evaluation). Texture methods used can be categorised as: statistical, geometrical, structural, model-based and signal processing features [17]. Van Gool et al. [18] and Reed and Buf [13] present a detailed survey of the various texture methods used in image analysis studies. Randen and HusØy [12] conclude that most studies deal with statistical, model-based and signal processing techniques. Weszka et al. [20] compared the Fourier spectrum, second order gray level statistics, co-occurrence statistics and gray level run length statistics and found the cooccurrence were the best. Similarly, Ohanian and Dubes [8] compare Markov Random Field parameters, multi-channel filtering features, fractal based features and co-occurrence matrices features, and the co-occurrence method performed the best. The same conclusion was also drawn by Conners and Harlow [2] when comparing run-length difference, gray level difference density and power spectrum. Buf et al. [1] however report that several texture features have roughly the same performance when evaluating co-occurrence features, fractal dimension, transform and filter bank features, number of gray level extrema per unit area and curvilinear integration features. Compared to filtering features [12], co-occurrence based features were found better as reported by Strand and Taxt [14], however, some other studies have supported exactly the reverse. Pichler et al. [10] compare wavelet transforms with adaptive Gabor filtering feature extraction and report superior results using Gabor technique. However, the computational requirements are much larger than needed for wavelet transform, and in certain applications accuracy may be compromised for a faster algorithm. Ojala et al. [9] compared a range of texture methods using nearest neighbour classifiers including gray level difference method, Law's measures, center-symmetric covariance measures and local binary patterns applying them to Brodatz images. The best performance was achieved for the gray level difference method. Law's measures are criticised for not being rotationally invariant, for which reason other methods performed better. In this paper we analyse the performance of five popular texture methods on the publicly available Meastex database [7,15] and Vistex database [19]. For each database we extract five feature sets and train a classifier. The performance of the classifier is evaluated using leave-one-out cross-validated method. The paper is organised as follows. We first present details of the Meastex and Vistex databases. Next, we describe our texture measures for data analysis and then present the experimental details. The results are discussed for the linear and nearest neighboour classifiers. Some conclusions are derived in the final section.
1.1 Meastex Benchmark Meastex is a publicly available texture benchmark. Each image has a size of 512x512 pixels and is distributed in raw PGM format. We split each image into 16 sub-images to increase the number of samples available for each class. The textures are available for classes asphalt, concrete, grass and rock. Finally we get a total of 944 images from which texture features are extracted. Table 1 shows the number of features extracted for each texture method. Table 2 shows the composition of the Meastex database.
Feature extraction method Autocorrelation Co-occurrence Edge frequency Law’s Primitive length
No. of features 99 14 70 125 5
Table 1. The number of features for each texture algorithm Label 1 2 3 4
Class Asphalt Concrete Grass Rock
Samples 64 192 288 400
Table 2. Details of Meastex data We find that the data for these classes is overlapping no matter what feature extraction method is employed. Therefore, their classification is not a trivial task.
1.2 Vistex Benchmark All images in the Vision Texure database are stored as raw ppm (P6) files with a resolution of 512x512 pixels. The analysis of Vistex data is more complicated than Meastex. There are several reasons for this. First, there is a larger number of classes involved. The increase in the number of classes does not always increase the complexity of the classification problem provided that the class data distributions are non-overlapping. However, in our case we find that Vistex class distributions are overlapping and the classification problem is by no means solvable using linear techniques alone. Second, Vistex data has much less number of samples for each class and it is expected that the imbalance between samples across different classes will make the classification more difficult. Third, and of most concern is the significant variability across samples of the same class in Vistex benchmark. The original Vistex database consists of 19 classes. Some of these classes have less than 5 sample images that have been removed from our analysis. We are finally left with 7classes that are: bark, fabric, food, metal, sand, tile, and water. Each image is divided into 4 images to increase the number of available samples. Label 1 2 3 4 5 6 7
Class Bark Fabric Food Metal Sand Tile Water
Samples 36 80 48 24 28 32 32
Table 3. Details of Vistex data
In Table 3 we summarise data details for Vistex analysis. Fig. 1 shows some of the samples of Meastex and Vistex benchmark data.
(a)
(b) Fig. 1 (a) Samples of Meastex data including asphalt, concrete, grass and rock; (b) Samples of Vistex data inluding bark, fabric, food, metal, sand, tile and water We next present the details of how our texture features are computed.
2. Texture Features Each texture extraction algorithm is based on capturing the variability in gray scale images. The different methods capture how coarse or fine a texture is in their own ways. The textural character of an image depends on the spatial size of texture primitives [5]. Large primitives give rise to coarse texture (e.g. rock surface) and small primitives give fine texture (e.g. silk surface). To capture these characteristics, it has been suggested that spatial methods are superior than spectral approaches. The five feature extraction methods used as a part of this study are based on this spatial element rather than analysing the frequency domain information of the given images. The autocorrelation method is based on finding the linear spatial relationships between primitives. If the primitives are large, the function decreases slowly with increasing distance whereas it decreases rapidly if texture consists of small primitives. However, if the primitives are periodic, then the autocorrelation increases and decreases periodically with distance. The set of autocorrelation coefficients are computed by estimating the relationship between all pixel pairs f(x,y) and f(x+p, y+q), where the upper limit to the values of p and q is set by the user. The co-occurrence approach is based on the joint probability distribution of pixels in an image [3]. A co-occurrence matrix is the joint probability occurrence of
gray levels i and j for two pixels with a defined spatial relationship in an image. The spatial relationship is defined in terms of distance d and angle θ. If the texture is coarse and distance d is small compared to the size of the texture elements, the pairs of points at distance d should have similar gray levels. Conversely, for a fine texture, if distance d is comparable to the texture size, then the gray levels of points separated by distance d should often be quite different, so that the values in the cooccurrence matrix should be spread out relatively uniformly. Hence, a good way to analyse texture coarseness would be, for various values of distance d, some measure of scatter of the co-occurrence matrix around the main diagonal. Similarly, if the texture has some direction, i.e. is coarser in one direction than another, then the degree of spread of the values about the main diagonal in the co-occurrence matrix should vary with the direction θ. Thus texture directionality can be analysed by comparing spread measures of co-occurrence matrices constructed at various distances d. From co-occurrence matrices, a variety of features may be extracted. The original investigation into co-occurrence features was pioneered by Haralick et al. [4]. From each matrix, 14 statistical measures are extracted. For edge frequency method, we can compute the gradient difference between a pixel f(x,y) and its neighbours at a distance d. For a given value of distance, the gradient differences can be summed up over the whole image. For different values of d (in our case 1≤d ≤70), we obtain different feature measurements for the same image. For Law’s method, a total of 25 masks are convolved with the image to detect different features such as linear elements, ripples, etc. These masks have been proposed by Law’s [6]. . We compute five amplitude features for each convolution, namely mean, standard deviation, skewness, kurtosis, and energy measurement. Finally, for primitive length features, we evaluate the number of strings of pixels that have the same gray level. Coarse textures are represented by a large number of neighbouring pixels with the same gray level, whereas a small number represents fine texture. A primitive is a continuous set of maximum number of pixels in the same direction that have the same gray level. Each primitive is defined by its gray level, length and direction. Five statistical features defining the characteristics of these primitives are used as our features. The detailed algorithms for these methods are presented by Sonka et al. [16].
3. Experimental Details and Results In this paper we present the experimental details of Meatex and Vistex separately. The texture feature sets have been derived from the above discussed methods. In addition, we also generate a combined feature set that contains all features from the five methods. There are total of 944 samples for Meastex data and 280 samples for Vistex data. We use leave-one-out method of cross-validation for exhaustively testing the data. In this method, for N samples, a total of N trials are conducted. In each trial a sample is taken out from the data set and kept for testing and the others are used for training. In each trial, therefore, we have a different set of training data and a different test data. The recognition performance is averaged across all trials. This methodology is superior to random partitioning of data to generate training and test sets as the resultant performance of the system may not reflect its true ability for texture recognition.
3.1 Meastex Results The results for Meastex data are shown in Table 4. Texture Method
Linear Best Classifier k 76.1% 5 Autocorrelation 79.2% 5 Co-occurrence 3 Edge Frequency 63.4% 82.8% 7 Law’s 7 Primitive Length 43.1% 87.5% 5 Combined
kNN Classifier Best kNN Classifier on PCA data on original data k 79.4% 7 86.1% 86.8% 5 93.5% 70.7% 5 74.9% 75.1% 7 69.3% 54.1% 7 55.9% 83.3% 5 83.6%
Table 4. The performance of the linear and nearest neighbour classifier on Meastex We have used the classical linear discriminant analysis and the k nearest neighour method. For the nearest neighbour method, the best performance has been selected for k=1, 3, 5 and 7 neighbours. We find that the best results have been produced by the linear classifier on the combined feature set. For individual feature sets, their performance with the linear classifier can be ranked as: Law’s, co-occurrence, autocorrelation, edge frequency and primitive length. Similarly the performance with the linear classifier can be ranked as: co-occurrence, autocorrelation, Law’s, edge frequency and primitive length. For the nearest neighbour method, the combined feature set does not produce better results compared to the individual best performance of the co-occurrence method. We find that except for Law’s and combined feature sets, the nearest neighbour classifier is a better classifier with an improvement of between nearly 3 to 10% better recognition. On the whole, most methods perform reasonably well on recognising this benchmark. A close evaluation of the confusion matrices shows that rock samples are by far the most difficult to classify. There is a considerable improvement in performance once PCA data is used. One of the reasons for this is that the PCA scores give low weight to features that are not too variable, thereby reducing their effect. As a result, nearest neighbour distance computations on the PCA data are more discriminatory across different class distributions than on original data. The improvements are for autocorrelation nearly 7% better, co-occurrence nearly 7% better, edge frequency nearly 4% better, primitive length 1% better and on combined features less than 1% better. The only inferior performance is that of Law's feature set where the recognition rate falls by nearly 6%.
3.2 Vistex Results The results for Vistex data are shown in Table 5. On Vistex data, the nearest neighbour classifier is once again superior on all data sets except Law’s and combined feature set with an improvement of up to 13% better performance. The most noticeable result using the linear classifier is with the combined feature set that gives a recognition performance of 94.6%. This is a considerable improvement on any single feature set performance. The same performance improvement is however not noticeable for nearest neighbour classifier. In ranked order of how well the feature sets perform with the linear classifier, we have: co-occurrence, Law’s,
autocorrelation, edge frequency and primitive length. For nearest neighbour, these ranks in descending order are: co-occurrence, autocorrelation, edge frequency, Law’s and primitive length method. Texture Method
Linear Classifier 66.7% Autocorrelation 73.9% Co-occurrence Edge Frequency 53.2% 68.8% Law’s Primitive Length 34.8% 94.6% Combined
k 1 1 3 7 7 5
KNN Classifier 75.3% 80.7% 66.8% 56.1% 42.4% 61.3%
Best kNN Classifier on k PCA data 1 91.4% 1 93.6% 3 76.1% 5 53.2% 1 56.1% 3 85.0%
Table 5. The performance of the linear and nearest neighbour classifier on Vistex As before, PCA data gives a considerable amount of improvement on nearest neighbour classifier results. The results improve by nearly 16% on autocorrelation features, by nearly 13% on co-occurrence matrices, by nearly 10% on edge frequency features, by nearly 12% on Law's feature set and by nearly 24% on combined feature set. Once more, the results on law's feature set are slightly inferior by nearly 3%. On the whole, all methods demonstrate good performance on this database.
4. Conclusion We find that for both Meastex and Vistex data excellent results are obtained with most of the texture analysis methods. The performance of the linear classifier is very good especially when we use a combined feature set for both benchmarks. For most feature sets, the performance can be further improved if we use a nearest neighbour classifier. Also, we find that for both Meastex and Vistex benchmarks, the ranked order of texture methods is similar. For example, co-occurrence matrix features are the best and primitive length features the worst for recognition. This ranked order is classifier dependent as we find that the order changes when we switch from a linear classifier to nearest neighbour method.
REFERENCES [1] J.M.H. Buf, M. Kardan and M. Spann, Texture feature performance for image segmentation, Pattern Recognition, 23(3/4):291-309, 1990. [2] R.W. Conners and C.A. Harlow, A theoretical comparison of texture algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(3):204-222, 1980. [3] J.F. Haddon, J.F. Boyce, Co-occurrence matrices for image analysis, IEE Electronics and Communications Engineering Journal, 5(2):71-83, 1993. [4] R. M. Haralick, K. Shanmugam and I. Dinstein, Textural features for image classification, IEEE Transactions on System, Man, Cybernetics, 3:610-621, 1973. [5] K. Karu, A.K. Jain and R.M. Bolle, Is there any texture in the image? Pattern Recognition, 29(9):1437-1446, 1996. [6] K.I. Laws, Textured image segmentation, PhD Thesis, University of Southern California, Electrical Engineering, January 1980. [7] Meastex database: http://www.cssip.elec.uq.edu.au/~guy/meastex/meastex.html [8] P.P. Ohanian and R.C. Dubes, Performance evaluation for four class of texture features, Pattern Recognition, 25(8):819-833, 1992. [9] T. Ojala, M. Pietikainen, A comparative study of texture measures with classification based on feature distributions, Pattern Recognition, 29(1):51-59, 1996. [10] O. Pichler, A. Teuner and B.J. Hosticka, A comparison of texture feature extraction using adaptive Gabor filter, pyramidal and tree structured wavelet transforms, Pattern Recognition, 29(5): 733-742, 1996. [11] W.K. Pratt, Digital image processing, John Wiley, New York, 1991. [12] T. Randen and J.H. HusØy, Filtering for texture classification: A comparative study, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(4):291310, 1999. [13] T. R. Reed and J.M.H. Buf, A review of recent texture segmentation and feature extraction techniques, Computer Vision, Image Processing and Graphics, 57(3):359-372, 1993. [14] J. Strand and T. Taxt, Local frequency features for texture classification, Pattern Recognition, 27(10):1397-1406, 1994. [15] G. Smith and I. Burns, Measuring texture classification algorithms, Pattern Recognition Letters, 18:1495-1501, 1997. [16] M. Sonka, V. Hlavac and R. Boyle, Image processing, analysis and machine vision, PWS publishing, San Francisco, 1999. [17] M. Tuceyran and A.K. Jain, Texture analysis, in Handbook of Pattern Recognition and Computer Vision, C.H. Chen, L.F. Pau and P.S.P. Wang (Eds.), chapter 2, 235-276, World Scientific, Singapore, 1993. [18] L. vanGool, P. Dewaele and A. Oosterlinck, Texture analysis, Computer Vision, Graphics and Image Processing, 29:336-357, 1985. [19] Vistex Database http://www-white.media.mit.edu/vismod/imagery/VisionTexture/vistex.html [20] J.S. Weszka, C. R. Dyer and A. Rosenfeld, A comparative study of texture measures for terrain classification, IEEE Transactions on Systems, Man and Cybernetics, 6:269-285, 1976.