Applying wavelets transform, rough set theory and support vector ...

Report 2 Downloads 121 Views
Expert Systems with Applications 36 (2009) 5822–5829

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Applying wavelets transform, rough set theory and support vector machine for copper clad laminate defects classification Te-Sheng Li * Department of Industrial Engineering and Management, Minghsin University of Science and Technology, Hsinchu, Taiwan

a r t i c l e

i n f o

Keywords: Copper clad laminate Discrete wavelet transform Inverse discrete wavelet transform Feature extraction Rough set theory Support vector machine

a b s t r a c t In this paper, we present a multi-resolution approach for the inspection of local defects embedded in homogeneous copper clad laminate (CCL) surfaces. The proposed method does not just rely on the extraction of local textural features in a spatial basis. It is based mainly on reconstructed images using the wavelet transform and inverse wavelet transform on the smooth subimage and detail subimages by properly selecting the adequate wavelet bases as well as the number of decomposition levels. The restored image will remove regular, repetitive texture patterns and enhance only local anomalies. Based on these local anomalies, feature extraction methods can then be used to discriminate between the defective regions and homogeneous regions in the restored image. Rough set feature selection algorithms are employed to select the feature. Rough set theory can deal with vagueness and uncertainties in image analysis, and can efficiently reduce the dimensionality of the feature space. Real samples with four classes of defects have been classified using the novel multi-classifier, namely, support vector machine. Effects of different sampling approach, kernel functions, and parameter settings used for SVM classification are thoroughly evaluated and discussed. The experimental results were also compared with the error back-propagation neural network classifier to demonstrate the efficacy of the proposed method. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Visual inspection plays a vital role of quality control in manufacturing systems. Manual inspection is subjective and highly dependent on the inspector’s expertise. In this study, we use wavelet transform combining rough set feature selector and a novel classifier, support vector machine to detect and classify copper clad laminate (CCL) surface defects. Most CCL surface defects are tiny involving obvious faulty items such as pinholes, stains, scratches, strips and other ill-defined defects. These unanticipated defects are small in size, refractive in light and cannot be described using explicit measures, making automatic defect detection difficult. Because CCL defects are similar to texture patterns, the inspection task in this study can be classified as texture detection and classification. In this environment, one must solve the problem of detecting small surface defects that break the local homogeneity of a texture pattern. Most defect detection methods for texture surfaces generally involve computing a set of textural features in a sliding window. The system searches for significant local deviations in the feature values. The most difficult task in this approach is feature extraction and feature selection which completely embody the texture information in the image. There is no arbitrary approach for selecting and judging the appropriate features to use. * Tel.: +886 3 5593142x3234; fax: +886 3 5595142. E-mail address: [email protected] 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.07.040

Therefore, selecting an adequate feature set for a new texture in the training process can be very time-consuming, requiring the help of expert knowledge. The more features extracted from the texture image, the more sophisticated the classifier, such as Bayes (Gonzalez & Woods, 1992), maximum likelihood (Cohen, 1992), and neural networks (Van Hulle & Tollenaere, 1993) needed to discriminate the texture classes. Numerous methods have been proposed to extract textural features either directly from the spatial domain or from the spectral domain. In the spatial domain, the more reliable and commonly used features are second-order statistics derived from spatial gray-level co-occurrence matrices (Haralick, Shanmugam, & Dinstein, 1973). They have been applied to wood inspection (Conners, McMillin, Lin, & Vasquez-Espinosa, 1983), carpet wear assessment (Siew & Hogdson, 1988), and roughness measurements for machined surfaces (Ramana & Ramamoorthy, 1996). In the past, Fourier transform and Gabor filter were two of the spectral domain approaches used to find the more adequate textural features that were less sensitive to noise and intensity variation. The Fourier transform is a global approach that characterizes only the spatial-frequency distribution, but it does not consider this information in the spatial domain. The Gabor filters are well regarded as a joint spectral representation for analyzing textured images with highly specific frequency and orientation characteristics. This technique extracts features by filtering the textured image with a set of Gabor filter banks characterized by the

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

frequency, the sinusoid orientation, and the Gaussian function scale. However, obtaining the optimal Gabor filter design is difficult because human intervention is required to select the appropriate filter parameters for the texture under study. If the center frequency of a selected Gabor filter does not match any of the important harmonics in the textured image, only noisy information is produced (Pichler, Teuner, & Hosticka, 1996). Recently, multi-resolution decomposition schemes based on discrete wavelet transforms (DWT) have received considerable attention as alternatives for textural feature extraction. The multi-resolution wavelet representation allows an image to be decomposed into a hierarchy of localized subimages at different spatial frequencies (Chen & Lee, 1997). It divides the 2-D frequency spectrum of an image into a lowpass (smooth) subimage and a set of highpass (detail) subimages. The textural features are then extracted from the decomposed subimages in different frequency channels and different resolution levels. Chen and Kuo (1993) proposed a tree-structured wavelet transform for texture classification. They used an energy criterion to select the subimage for decomposition. A set of textural features is derived from the energy values of the dominant channels, and distance measures are then employed to discriminate between the texture classes. Lambert and Bock (1997) proposed a feature extraction approach for texture defect detection. The textural features were derived from the wavelet packet decomposition coefficients. Neural network and Bayes classifiers were used to evaluate the feature vector. The wavelet-based feature extraction methods were applied to industrial material inspection such as LSI wafers (Lee, Choi, Choi, & Choi, 1997; Maruo, Shibata, Yamaguchi, Ichikawa, & Ohmi, 1999), cold rolled strips (Sari-Sarraf & Goddard, 1998), and woven fabrics (Tsai & Hsieh, 1999). Tsai and Hsieh (1999), Tsai and Hsiao (2001), and Tsai and Huang (2003) proposed a global approach based on a Fourier image reconstruction scheme for inspecting surface defects in textures. Their method does not depend on local texture features but detecting defects via the high-energy frequency components in the Fourier spectrum. In the restored image of a textured surface, the regular region with periodic lines in the original image will have an approximately uniform gray level, whereas the defective region will be distinctly preserved. Rough set theory (RST), which was introduced by Pawlak in the early 1980s, (Pawlak, 1982) is a new mathematical tool that can be employed to handle uncertainty and vagueness. It focuses on the discovery of pattern in inconsistent data (Pawlak, 1996; Slowinski & Stefanowski, 1989) and can be used as the basis to perform formal reasoning under uncertainty, machine learning, and rule discovery (Yao, Wong, & Lin, 1997). The basic philosophy behind rough set theory is based on equivalence relations or indiscernibility in the classification of objects (Walczak & Massart, 1999). It does not need any a priori knowledge, such as probability in statistics, or the basic probability assignment in the Dempster–Shafer theory of evidence or membership grade in the fuzzy set theory. Using rough set theory as a data mining tool, it can explore the hidden patterns in the data set and obtain the decision rules. That is, rough set theory can be used for (a) reduction of feature spaces; (b) finding hidden patterns; and (c) generation of decision rules (Kusiak, 2001; Pawlak, 1982). In the past decades, the rough set theory has been successfully applied to many real-world problems in medicine, pharmacology, engineering, banking, financial and market analysis (Ahna, Cho, & Kim, 2000; Kusiak, 2001; the related methodology refers Refs. Pawlak (1982), Pawlak (1996)). From Bayes classifiers to neural networks, there are many possible choices for classifier selection. Among these, support vector machines (SVM) would appear to be a good candidate because of their ability to generalize in high-dimensional spaces, such as spaces spanned by texture patterns. The appeal of SVM is based

5823

on its strong connection to the underlying statistical learning theory. That is, an SVM is an approximate implementation of the structural risk minimization (SRM) method (Vapnik, 1995; Vapnik, 1998). For several pattern classification applications, SVM have already been shown to provide better generalization performance than traditional techniques, such as neural networks (Schokopf et al., 1997). The aim of this paper is to illustrate the potential of DWT, RST and SVM in CCL defects detection and classification. The proposed method incorporates the DWT, inverse DWT, and RST with the SVM classifier, using the selected features fed directly into the SVM classifier. As a result, the features obtained from the restored defects image are nonlinearly mapped into the SVM architecture. Since SVM was originally developed for two-class problems, its basic scheme is extended to multi-texture classification by adopting the one-against-all and one-against-one decomposition methods. This works by applying SVM to separate one class from all the other classes, or separate each pair among all classes. Thereafter, feature detection and classification are both performed in accordance with a unique criterion referred to as the classification rate. This paper is organized as follows. Section 2 describes the DWT and inverse DWT transform in image, the rough set theory for defects feature selection and classification approach used to determine the multi-class defects by SVM. Section 3 presents experimental results on four categories of CCL surface defects classification using SVM classifier. Section 4 discusses the effects of the proposed method. The paper is concluded in Section 5.

2. Methodology 2.1. Wavelet transform and reconstruction Wavelets are functions generated from one single function by dilations and translations. The basic idea of the wavelet transform is to represent any arbitrary function as a superposition of wavelets. Any such superposition decomposes the given function into different scale levels where each level is further decomposed with a resolution adapted to that level (Mallat, 1989). This process continues until some final scale is reached. The values or transformed coefficients in approximation and detail images (sub-band images) are the essential features, which are as useful for texture discrimination and segmentation. Since textures, either micro or macro, have non-uniform gray-level variations, they are statistically characterized by the values in the DWT transformed sub-band images or the features derived from these sub-band images or their combinations. In other words, the features derived from these approximation and detail sub-band images uniquely characterize a texture. The 2-D wavelet analysis operation consists of filtering and down-sampling horizontally using 1-D lowpass filter L and highpass H to each row in the image and produces the coefficient matrices fL(x, y) and fH(x, y). Vertically filtering and down-sampling follows, using the lowpass and highpass filters L and H to each column in f(L) and f(H), and produces four subimages f(LL), f(LH), f(HL), and f(HH) for one level decomposition. The 2-D pyramid algorithm can iterate on the smooth subimage f(LL) to obtain four coefficient matrices in the next decomposition level. For images, there exist an algorithm similar to the 1-D case for 2-D wavelets and scaling functions obtained from 1-D ones by tensorial product. This kind of 2-D DWT leads to a decomposition of approximation coefficients at level j in four components: the approximation at level j + 1, and the details in three orientations (horizontal, vertical, and diagonal). The following chart describes the basic decomposition steps for images.

5824

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

The inverse DWT performs a single-level 2-D wavelet reconstruction with respect to either a particular wavelet or particular wavelet reconstruction filters that you specify. The following chart describes the basic decomposition steps for images (see Fig. 1). 2.2. Support vector machine classifier SVMs (Chapelle, Haffner, & Vapnik, 1999; El-Naqa, Yongyi, Wernick, Galatsanos, & Nishikawa, 2002; Kim, Jung, Park, & Kim, 2002; Kim, Jung, & Kim, 2003; Liyang, Yongyi, Nishikawa, & Wernick, 2005a; Liyang, Yongyi, Nishikawa, & Yulei, 2005b; Song, Hu, & Xie, 2002; Vapnik, 1995) have recently been proposed as popular tools for learning from experimental data. The reason is that SVMs are much more effective than other conventional nonparametric classifiers (e.g., the neural networks, nearest neighbor, k-NN classifier) in term of classification accuracy, computational time, stability to parameter setting. One of the neural network models that fulfill these requirements is support vector machine (SVM). The SVM obtains high generalization performance without the need a priori knowledge even when the input dimension space is high. Moreover, this is a model that allows more accurate formal assessment of the generalization performance. This fits well within the framework of this proposal. Below we attempt a rigorous assessment of the generalization performance of the supervised network. We also provide a methodology for SVM model selection well fitted to the number of transferred patters (i.e., model order selection). However, some notations and key concepts should be defined first. 2.2.1. Optimal separating hyperplane Let fðxi ; yi Þ; i ¼ 1; . . . ; Ng be a training example set S, each example xi e Rn belongs to a class labeled by yi e {1, 1}. The goal is to define a hyperplane which divides S such that all the points with the same label are on the same side of the hyperplane while maximizing the distance between the two classes. This means to find a pair (w, b) such that

yi ðw  xi þ bÞ > 0; n

i ¼ 1; . . . ; N;

ð1Þ

where w e R hyperplane

and b e R. The pair (w, b) defines a separating

w  x þ b ¼ 0:

ð2Þ

Accordingly, we can know the minimal distance between the closest point and the hyperplane is 1=kwk. Besides, among these separating hyperplanes, the OSH is a hyperplane, which the distance to the closest point is maximal. Hence, in order to find the OSH, we must minimize kwk2 under constraint Eq. (3). Since kwk2 is convex, we can minimize it under constraint Eq. (3) by means of the classical method of Lagrange multipliers. If we denote with a ¼ ða1 ; a2 ; . . . ; aN Þ the N nonnegative Lagrange multipliers associated with constraint Eq. (3), the problem of finding OSH is equivalent to the maximization of the function

ð3Þ

ai 

i¼1

N 1X ai aj yi yj xi  xj ; 2 i;j¼1

ð4Þ

P where ai P 0 and under constraint Ni¼1 yi ai ¼ 0 . 1; a 2; . . . ; a  N Þ solution has been found, the  ¼ ða Once the vector a  has the following expansion:  bÞ OSH ðw;

 ¼ w

N X

a i yi xi

ð5Þ

i¼1

 can be determined from a  and from the Kühn–Tucker while b conditions (Vapnik, 1995)

  1Þ ¼ 0; i ¼ 1; 2; . . . ; N:   xi þ bÞ a i ðyi ðw

ð6Þ

By the way, the corresponding training examples (xi, yi) with nonzero coefficients ai are called support vectors. At last, the decision function of classifying a new data point x can be written as:

f ðxÞ ¼ sgn

N X

!

a i yi xi  x þ b :

ð7Þ

i¼1

2.2.2. Non-linear support vector machines The training example set that we want to classify is usually non-linear. To achieve better generalization performance, the input data can be mapped into a high-dimensional feature space first. Then the OSH is constructed in the feature space. If U(x) denotes a mapping function that maps x into a highdimensional feature space as follows:

WðaÞ ¼

If there exists a hyperplane satisfying, the set S is said to be linearly separable and we can change w and b so that

yi ðw  xi þ bÞ P 1: i ¼ 1; . . . ; N:

N X

WðaÞ ¼

N X

ai 

i¼1

N 1X ai aj yi yj Uðxi Þ  Uðxj Þ: 2 i;j¼1

ð8Þ

Now, let K(xi, xj) = U(xi)  U(xj), we can rewrite above equation as

columns CAj+1

1 2

rows

Lo_R columns

2 1

Lo_R

(h)

CDj+1 horizontal CDj+1(v) vertical

CDj+1(d) diagonal

1 2

Hi_R columns

1 2

rows

Lo_R columns

1 2

CAj

2 1

Hi_R

Hi_R

Fig. 1. 2-D discrete wavelet reconstruction steps, where 2 ↓ 1 Upsample columns: insert zeros at odd-indexed columns. 1 ↓ 2 Upsample rows: insert zeros at odd-indexed rows. rows X Convolve with filter X the rows of the entry. X Convolve with filter X the columns of the entry. colummns

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

WðaÞ ¼

N X

ai 

i¼1

N 1X ai aj yi yj Kðxi ; xj Þ; 2 i;j¼1

ð9Þ

where K is called a kernel function and must satisfy Mercer’s theorem (Vapnik, 1995). Finally, the decision function becomes:

f ðxÞ ¼ sgn

N X

!

ai yi Kðxi ; xÞ þ b :

ð10Þ

i¼1

Typical kernel functions are the following:

Linear Kðx; zÞ ¼ x  z

ð11Þ

Polynomial Kðx; zÞ ¼ ðc  x  z þ coefÞd ;

ð12Þ

where c and coef are constants and d is a degree: Gaussian Radial Basis Kðx; zÞ ¼ expðc  jx  zj2 Þ;

ð13Þ

where c is a constant: Sigmoidal Kðx; zÞ ¼ tanhðc  x  z þ coefÞ; where c and coef are constants:

ð14Þ

2.3. Multi-class support vector machines Two main approaches have been suggested for applying SVMs to multi-class classification. One is the one-against-all strategy to classify between each class and all the remaining; the other is the one-against-one strategy to classify between each pair. In each, the underlying basis has been to reduce the multi-class problem to a set of binary problems, enabling the basic SVM approach to be used. In the one-against-all approach, a set of binary classifiers, each trained to separate on class from the rest, is undertaken, and the input vector allocated to the class for which the largest decision value was determined (Hsu & Lin, 2002). The ith SVM is trained from the training samples where some examples contained in the ith class have ‘‘+1” labels, and other examples contained in the other classes have ‘‘1” labels. Specifically, with this approach, where n is the number of classes

ðwi ÞT UðxÞ þ bi

where i ¼ 1 ; . . . ; n:

ð15Þ

The data x then belong to the class, for which the above decision function has the largest value

class of x  arg max ððwi ÞT UðxÞ þ bi Þ: i¼1; ... ;n

ð16Þ

The second method of reducing a multi-class problem to a series of binary ones to enable the application of the basic SVM model for multi-class classification is the ‘‘one-against-one” approach. In this approach, a series of classifiers is applied to each pair of classes, with the most commonly computed class label kept for each input. The application of this method requires n(n  1)/2 classifiers or machines be applied to each pair of classes and a strategy to handle instances in which an equal number of votes are derived for more than one class for a pattern (Hsu & Lin, 2002). Once all n(n  1)/2 classifiers have been undertaken, the max-win strategy is followed. Specifically, if sgn((wjl)TU(x) + bjl) evaluates x to be in jth class, then the vote for the jth class is incremented by one; else that for the lth class is increased by one. Finally, the data vector x is predicted to belong to the class with maximum number of votes.

5825

with 8-bit gray levels. The proposed approach is divided into four steps: (1) preprocessing the original images; (2) wavelet transform and inverse wavelet transform; (3) feature extraction and selection; and (4) support vector machine classification. Besides these, the comparison with the error back-propagation neural network classifier is also discussed in this section. 3.1. Defects classification and preprocessing The current inspection activity of CCL surface depends mainly upon the inspector’s visual inspection and feeling with a glove. This inspection activity could be very subjective and highly dependent on the experience of human inspectors. Among the 250 collected samples, there are four major classes of defects on the CCL surface that include convex dots, pinholes, flashes, and oxidation. Fig. 2 shows the four different defects categories and the number of the defects that belong to each class. The major preprocessing steps in this study include smooth filtering, sobel filtering, and close morphological operations. The main objective of preprocessing is to perform the image processing operation on the source image to enhance and smooth images, accentuate or extract image edges, or remove ‘‘noise” from an image. They allow images to be separated into their low and high spatial-frequency components. The preprocessing results are shown in Fig 3, respectively. 3.2. Wavelet transformation and inverse wavelet transformation After obtaining the preprocessing defect images in previous experiments, we conduct the 2-D wavelet decomposition. In this experiment, we evaluate five orthogonal wavelet bases including Haar, Daublets D4 and D8, and Symmlets S8 and S20 and different resolution level. The appropriate wavelet bases and resolution level will effectively highlight the local anomalies in the homogeneous surface. If the number of multi-resolution levels is too small they cannot sufficiently separate defects from the repetitive texture pattern. However, a too large number of multi-resolution levels yields the fusion effect of the anomalies, and may result in false detection. Multi-resolution levels 3 and 4 are most appropriate to enhance defects in the restored image in these experiments (Tsai & Hsiao, 2001). The experiment evaluates the three selective detail subimage combinations, namely, the horizontal and diagonal detail subimage, vertical and diagonal detail subimage and vertical and horizontal detail subimage, respectively, with a repetitive horizontal line pattern defect in Fig. 4. The selective subimage effects on CCL defect image reconstruction are similar. However, the reconstructing horizontal and diagonal detail subimages can effectively enhance the defect regions in the restored image. It also efficiently separates defects from the background in the corresponding binary images. Therefore, the features extractions are preferred to be selected from the defects in horizontal and diagonal detail subimage. Fig. 5 shows the four reconstructed images from horizontal and diagonal detail subimages remove all non-defect areas in the original images and preserve only the defects of the CCL surface. The defects in the restored image can be efficiently separated from the background using a simple binary thresholding technique such as the one proposed by (Otsu, 1979). 3.3. Feature extraction and selection

3. Experimental results and discussion In this section, we present the experimental results following the previous section approach on a variety of surface defects found in real samples to evaluate the performance of the proposed defect detection and classification method. All experiments are implemented on a personal computer, and images are 256  256 pixels

The features used in this study are mainly extracted from the geometric features obtained via the blob analysis detected from the restored images. There are 20 extracted features in this study, including area, perimeter, elongation, convex perimeter, compactness, number of holes, roughness, Euler number, invariant moments obtained from the geometric features etc. Based on these

5826

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

Fig. 2. Original defect images: (a) pine hole (139 samples), (b) convex dot (8 samples), (c) flash (53 samples), and (d) oxidation (50 samples).

Fig. 3. Preprocessing results: (a) pine hole, (b) convex dot, (c) flash, and (d) oxidation.

Fig. 4. Restored images from selective subimages: (a) the original image, (b) horizontal + diagonal subimage, (c) vertical + diagonal subimage, and (d) horizontal + vertical subimage.

Fig. 5. 2-D inverse DWT reconstruction images: (a) pinhole, (b) convex dot, (c) flash, and (d) oxidation.

20 features as a full set of attributes, we will concentrate in the following section on rough set theory to feature selection. Rough set approach to feature selection can be based on minimal description length principle (Rissanen, 1978) and tuning methods of parameters of the approximation spaces to obtain high quality classification based on selected features. In this study, the application of rough set theory for feature selection is in a closed loop. This method is based on searching first for short reducts or reduct approximations. We have implemented the feature selection using software ROSETTA (see http://www.idi.ntnu.no./~aleks/ rosetta/rosetta.html). The ROSETTA software supports six-feature selection algorithms, namely, Genetic algorithm, Johnson’s algorithm, Dynamic reducts (RSES), Exhaustive calculation (RSES), Johnson’s algorithm (RSES) and Genetic algorithm (RSES). Each method has the selection criteria that may be based on the genetic algorithm with fitness function, a simple greedy algorithm, or calculating the most occurring reducts across the decision table. Accordingly, the most occurring reducts subsets obtained by means of the six-feature selection algorithm are listed in Table 1. The reduct subsets consist of 18, 14 and 8 feature composition.

These reducts composition will be put into the SVM classifier for CCL surface defect detection and classification. 3.4. Classification results using SVM After obtaining the features extracted from the defect subimages, the support vector machine is used to classify four main defect categories on the CCL surface. The SVMs are constructed using LIBSVM version 2.6 (Chang & Lin, 2001) software to validate the classification capability of the proposed feature extraction and feature selection approach, i.e., inverse wavelet reconstruction and rough set approach. To evaluate the performance of the proposed approach, the holdout and tenfold cross-validation random sampling methods have been adopted in the SVM classification. Four existing kernel functions for SVM classifier are also considered, namely, RBF, Linear, Polynomial and Sigmoid function with the original features set and reduced 18, 14, and 8 feature set, respectively. Thus, the total 250 collected samples with four classes were divided into 2/3 for training data set (i.e. 167 samples) and 1/3 for testing data set (i.e. 83 samples). In our evaluation, we used accuracy rate of

5827

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829 Table 1 Results of feature selection 18 Features 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

14 Features Area Perimeter Feret minimum diameter Feret minimum angle Feret maximum diameter Feret maximum angle Convex perimeter Compactness Number of holes Feret elongation Roughness Euler number Length Breadth Elongation Number of Runs Minimum pixel Principal axis (Bin.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 – – – –

8 Features Area Perimeter Feret minimum diameter Feret maximum diameter Feret mean diameter Convex perimeter Compactness Feret elongation Roughness Length Breadth Elongation Number of runs Secondary axis (Bin.) – – – –

classification as a figure of merit. Accuracy rate of classification is defined as the total number of correctly classified samples divided by the total number of samples classified. In Table 2, we summarized the results for the holdout method using four kernel functions with original and reduced feature sets. Although the generalization ability of the SVM is relatively robust to variations in the parameter settings, the method used to define the parameter c and cost parameter C in the kernel function RBF will insure that high accuracy is obtained. Accordingly, the parameters are set in the range of c = [0.001, 200], C = [0.1, 50,000] using grid combinations for testing the classification accuracy. The other parameters with different kernels are set as follows: Cof0 is set from 0.1 to 100, and kernel order p = 3, 4 for Polynomial. These results indicate that the performance of the SVM classifier is not very sensitive to the value of the model parameter setting. The software can automatically search the best parameter setting for the accuracy rate of classification. The training time of the RBF and Linear kernel is around 20 s faster than those of the Polynomial and Sigmoid (implemented on a Pentium IV 1.8 GHz PC). Note that, as expected, the accuracy of the original feature set is higher than those of the reduced feature set. The best result of the classification is the Gaussian RBF kernel function and the accuracy rate of classification is over the 80%, whereas the other kernel functions are almost near 80% accuracy. In addition, the 18 reduced feature set is obtained the better result than that of the 14 and 8 feature sets, respectively.

1 2 3 4 5 6 7 8 – – – – – – – – – –

Perimeter Feret maximum diameter Feret mean Diam. Convex perimeter Length Elongation Number of runs Secondary axis (Bin.) – – – – – – – – – –

The SVM classifier as the same parameter setting as the holdout method is then further trained using the tenfold Cross-Validation method with the four kernel functions. In Table 3, we summarize the results of the Cross-Validation method that the classification accuracies are almost the same with the four kernel functions, accuracy between 68% and 81%. The best result is obtained using the 14 reduced features via Linear kernel, and it takes the longest training time for Sigmoid kernel. The accuracy of classification with the reduced feature set 14 and 18 are not significantly different from those of the original feature set. Consequently, the performance of the holdout method is better than the Cross-Validation method as the classification accuracy is 74  85% higher than that of 68  81%. The efficiency also shows that the holdout method is higher than that of Cross-Validation method. In order to compare the performance of the proposed approach, we also implement the error back-propagation neural network (BPN) classifier which is the most commonly used feed forward neural networks with the same classification problem. As described in the previous section, the parameters of SVM are identical among the four kernel functions. The data set have been divided into 2/3 (167 data) for training, and 1/3 (83 data) for testing with four feature sets either for SVM or for back-propagation neural classifier. The single hidden layer (5 hidden neurons), learning rate 0.01, and momentum 0.8 are used to classify the defects on the CCL

Table 2 Performance of holdout method (Accuracy%) No. of features

RBF

Linear

Polynomial

Sigmoid

Train

Test

Train

Test

Train

Test

Train

Test

20

85.03 88.02 88.62

84.34 84.34 85.54

79.64 90.42 91.02

83.13 81.93 80.72

83.83 84.43 83.23

81.93 81.93 81.93

76.66 74.25 77.84

83.13 78.31 78.31

Average

87.22

84.74

87.03

81.93

83.83

81.93

76.25

79.92

18

88.62 86.83 86.83

84.34 81.93 84.34

70.66 81.43 88.62

72.29 81.93 83.13

78.44 78.44 80.24

81.93 79.52 79.52

81.44 78.44 74.85

81.93 80.72 74.70

14

89.82 92.81 90.42

79.52 83.13 81.93

88.02 89.22 88.62

77.11 79.52 77.11

89.82 88.62 89.22

77.11 80.72 79.52

84.43 83.23 84.43

79.52 78.31 79.52

8

86.83 87.43 91.02

78.31 79.52 81.93

83.83 81.44 82.04

75.90 78.31 77.11

84.43 83.23 86.23

75.90 74.70 75.90

76.66 74.85 73.65

75.90 74.69 74.70

Average

88.96

81.66

83.76

78.05

84.30

78.31

79.11

77.78

5828

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

Table 3 Performance of cross-validation method (Accuracy%) No. of features

RBF

Linear

Polynomial

Sigmoid

Train

Test

Train

Test

Train

Test

Train

Test

20

84.40 84.00 85.20

80.40 80.00 80.00

87.60 88.00 88.40

81.20 81.60 79.60

86.40 86.80 86.80

80.00 80.00 79.60

78.80 78.40 78.80

74.80 75.20 76.00

Average

84.53

80.13

88.00

80.80

86.67

79.87

78.67

75.33

18

87.60 87.20 87.20

80.40 80.00 80.40

88.80 87.20 87.60

80.40 80.80 80.40

86.80 87.60 87.20

80.80 80.80 80.40

81.60 83.20 79.60

77.20 79.20 76.40

14

83.60 84.80 84.00

79.60 79.20 80.00

86.80 86.40 87.20

81.60 81.20 82.40

84.40 82.80 82.80

76.80 76.80 78.80

77.60 77.60 79.60

76.00 78.40 76.00

8

83.60 81.60 80.40

76.00 77.60 75.20

80.80 80.00 80.80

78.80 76.80 78.00

79.60 83.60 79.60

77.60 79.60 76.80

77.60 72.80 71.60

74.00 68.80 69.60

Average

84.44

78.71

85.07

80.04

83.82

78.71

77.91

75.07

Table 4 Comparison of the SVM and BPN classifier (Accuracy%) No. of feature

SVM

BP

RBF

Linear

Polynomial

Sigmoid

Train

Test

Train

Test

Train

Test

Train

Test

Train

Test

20 18 14 8

87.22 87.43 91.02 88.43

84.74 83.54 81.54 79.92

87.03 80.24 88.62 82.44

81.93 78.31 77.91 77.11

83.83 79.04 89.22 84.63

81.93 80.32 79.12 75.50

76.25 78.24 84.03 77.05

79.92 79.12 79.12 75.10

85.04 83.13 80.74 81.74

87.55 80.12 81.73 78.32

Average

88.525

82.43

84.583

78.81

84.18

79.21

78.893

78.31

82.663

81.93

surface based on four different feature sets. The classification accuracies of four SVM classifier and BPN are shown in the Table 4. It is obvious that the more the feature of the classifier, the higher the accuracy of the classification. Moreover, the simpler the network structure, the lower the training time (18-5-1 equal to 45 s vs. 85-1 equal to 30 s). The best test result of the SVM is the classifier with RBF kernel, which is 82.43% better than those of the other three SVM classifiers as well as BPN classifier that average 78.81%, 79.21%, 78.83% and 81.93%, respectively. In addition, the classification accuracies of the reduced feature set via rough set feature selection between SVM classifiers and BPN classifier are above or near 80% accuracy. It is noted that the SVM classifier with RBF kernel function via rough set theory can reduce and select the significant feature in the classification because most of the classification accuracies are higher than that of BPN classifier. In SVM classifier, we can control the tradeoff between the complexity of decision rule and training error rate by searching the combination of C and gamma.

4. Discussion Based on the experimental results and experience in CCL defect classification problems, we would like to highlight that: 1. The high classification accuracies of the multi-class SVM and BPN classifiers give insight into the features selected for defining the CCL defect signals. The conclusions drawn in the applications demonstrated that the inverse wavelet and geometry feature which well represent the CCL defects signals and by the usage of these selected feature via rough set a good distinction between classes can be obtained.

2. During the SVM training, most of the computational effort is spent on finding the optimal parameter with different kernel functions in order to obtain high classification accuracy. The SVMs map the features to higher dimensional space and then use an optimal hyperplane in the mapped space. This implies that though the original features carry adequate information for good classification, mapping to a higher dimensional feature space could potentially provide better discriminatory clues that are not present in the original feature space. One would not know the suitability of a kernel function and performance of the SVM until one has tried and searched with different parameter. 3. The proper selection of multi-resolution level and subimage combination can appropriately enhance defects in the restored image. Experimental results on a variety of CCL defects images have confirmed that three or four multi-resolution levels are generally sufficient for CCL defect defection application. On the other hand, the proper selection of decomposed subimage combinations for reconstruction can effectively enhance the defects with high directionality. The selective subimage effects on CCL defect image reconstruction are similar. However, the reconstructing horizontal and diagonal detail subimages can reveal the defect regions in the restored image. It also efficiently separates defects from the background in the corresponding binary images. Therefore, the features extractions are preferred to be selected from the defects in horizontal and diagonal detail subimage.

5. Conclusion In this paper, we have conducted the classification problem of detecting CCL defects embedded in a homogeneous texture using

T.-S. Li / Expert Systems with Applications 36 (2009) 5822–5829

the inverse wavelet transform, rough set selector and SVM classifier. The proposed method not only works on local textural features in a spatial domain, but also based on an image restoration scheme using the multi-resolution wavelet transforms with the advantage of computational savings. With proper selection of a smooth subimages and detail subimages in different multi-resolution levels for image restoration, the global repetitive texture pattern can be effectively removed and local anomalies can be enhanced in the restored image. Thus, the rough set selector is therefore used to conduct the feature selection in the original feature sets of the restored image. The high generalization performance of SVM classifier without the need of prior knowledge can map the lower feature space into higher feature space increasing the accuracy of classification. The experimental results have revealed that SVM with RBF kernel function obtained higher classification accuracy compared with the other kernel functions. The classification accuracies of the reduced feature set via rough set feature selection are above or near an 80% accuracy rate. It is noted that the SVM classifier with different kernel function via rough set theory can reduce and select the significant feature in the classification. Moreover, the reduced feature set well represented the capability of SVM, and kept with the suitable classification accuracy. The proposed method implemented in this study can promote the accuracy of the inspection and avoid human intervention for defect inspection in homogenous CCL surface defects. References Ahna, B. S., Cho, S. S., & Kim, C. Y. (2000). The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Systems with Applications, 18, 65–74. Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines. Available from: . Chapelle, O., Haffner, P., & Vapnik, V. N. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064. Chen, T., & Kuo, C. C. J. (1993). Texture analysis and classification with treestructured wavelet transform. IEEE Transactions on Image Processing, 2, 429–441. Chen, C. H., & Lee, G. G. (1997). On digital mammogram segmentation and microcalcification detection using multiresolution wavelet analysis. Graphical Models and Image Processing, 6, 3606–3609. Cohen, F. S. (1992). Maximum likelihood unsupervised textured image segmentation. GVGIP: Graphical Models Image Process, 54, 239–251. Conners, R. W., McMillin, C. W., Lin, K., & Vasquez-Espinosa, R. E. (1983). Identifying and locating surface defects in wood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 573–583. El-Naqa, I., Yongyi, Y., Wernick, M. N., Galatsanos, N. P., & Nishikawa, R. M. (2002). A support vector machine approach for detection of microcalcifications. Medical Imaging, 21(12), 1552–1563. Gonzalez, R. C., & Woods, R. E. (1992). Digital image processing. MA: Addison-Wesley. Haralick, R. M., Shanmugam, K., & Dinstein, I. (1973). Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics, 3, 610–621. Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425. Kim, K. I., Jung, K., & Kim, J. H. (2003). Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift

5829

algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1631–1639. Kim, K. I., Jung, K., Park, S. H., & Kim, J. H. (2002). Support vector machines for texture classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(11), 1542–1550. Kusiak, A. (2001). Rough set theory: A data mining tool for semiconductor manufacturing. IEEE Transactions on Electronics Packaging Manufacturing, 24(1), 44–50. Lambert, G., & Bock, F. (1997). Wavelet method for texture defect detection. In IEEE international conference on image processing (pp. 201–204), Santa Barbara, CA. Lee, C. S., Choi, C. H., Choi, J. Y., & Choi, S. H. (1997). Surface defect inspection of cold rolled strips with features based on adaptive wavelet packets. IEICE Transactions on Information System, E80-E, 594–604. Liyang, W. Y., Yongyi, R. M., Nishikawa, M. N., & Wernick, A. E. (2005a). Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Transactions on Medical Imaging, 24(10), 1278–1285. Liyang, W., Yongyi, Y., Nishikawa, R. M., & Yulei, J. (2005b). A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications. IEEE Transactions on Medical Imaging, 24(3), 371–380. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. Maruo, K., Shibata, T., Yamaguchi, T., Ichikawa, M., & Ohmi, T. (1999). Automatic defect pattern detection on LSI wafers using image processing techniques. IEICE Transactions on Electronics, E82-C, 1003–1012. Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on System, Man and Cybernetics, 9(1), 62–66. Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Science, 11(5), 341–356. Pawlak, Z. (1996). Why rough sets. In Proceedings of the IEEE international conference on fuzzy system. Piscataway, NJ. Pichler, O., Teuner, A., & Hosticka, B. J. (1996). A comparison of texture feature extraction using adaptive gabor filtering, pyramidal and tree structured wavelet transforms. Pattern Recognition, 29, 733–742. Ramana, K. V., & Ramamoorthy, B. (1996). Statistical methods to compare the texture features of machined surfaces. Pattern Recognition, 29, 1447–1459. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471. Sari-Sarraf, H., & Goddard, J. S. (1998). Robust defect segmentation in woven fabrics. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 938–944). Schokopf, B., Sung, K., Burges, C. J. C., Girosi, F., Niyogi, P., Pogio, T., et al. (1997). Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing, 45(11), 2758–2765. Siew, L. H., & Hogdson, R. M. (1988). Texture measures for carpet wear assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 92–105. Slowinski, K., & Stefanowski, J. (1989). Rough classification in incomplete information systems. Mathematical and Computer Modeling, 12(10–11), 1347–1357. Song, Q., Hu, W., & Xie, W. (2002). Robust support vector machine with bullet hole image classification. IEEE Transactions on System, Man and Cybernetics, Part C, 32(4), 440–448. Tsai, D. M., & Hsiao, B. (2001). Automatic surface inspection using wavelet reconstruction. Pattern Recognition, 34, 1285–1305. Tsai, D. M., & Hsieh, C. Y. (1999). Automated surface inspection for directional textures. Image and Vision Computing, 18, 49–62. Tsai, D. M., & Huang, T. Y. (2003). Automated surface inspection for statistical textures. Image and Vision Computing, 21, 307–323. Van Hulle, M. M., & Tollenaere, T. (1993). A modular artificial neural network for texture processing. Neural Networks, 6, 7–32. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer Verlag. Vapnik, V. (1998). Statistical learning theory. John Wiley. Walczak, B., & Massart, D. L. (1999). Rough set theory. Chemometrics and Intelligent Laboratory Systems, 47, 1–16. Yao, Y.Y., Wong, S. K. M., & Lin, T. Y. (1997). A review of rough sets models. rough set and data mining – analysis for imprecise data (pp. 47–76). Boston, MA: Kulwer.