Wavelet twin support vector machine - Semantic Scholar

Report 3 Downloads 224 Views
Neural Comput & Applic (2014) 25:1241–1247 DOI 10.1007/s00521-014-1596-y

ORIGINAL ARTICLE

Wavelet twin support vector machine Shifei Ding • Fulin Wu • Zhongzhi Shi

Received: 11 February 2014 / Accepted: 7 April 2014 / Published online: 23 April 2014  Springer-Verlag London 2014

Abstract Twin support vector machine (TWSVM) is a research hot spot in the field of machine learning in recent years. Although its performance is better than traditional support vector machine (SVM), the kernel selection problem still affects the performance of TWSVM directly. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The wavelet kernel function based on wavelet analysis can approximate any nonlinear functions. Based on the wavelet kernel features and the kernel function selection problem, wavelet twin support vector machine (WTWSVM) is proposed by this paper. It introduces the wavelet kernel function into TWSVM to make the combination of wavelet analysis techniques and TWSVM come true. The experimental results indicate that WTWSVM is feasible, and it improves the classification accuracy and generalization ability of TWSVM significantly. Keywords SVM  TWSVM  Wavelet kernel function  WTWSVM

S. Ding (&)  F. Wu School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China e-mail: [email protected] S. Ding  F. Wu  Z. Shi Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

1 Introduction Firstly proposed by Vapnik et al., support vector machine (SVM) is a machine learning method which can be applied to solve the binary classification problem [1–3]. The majority of the scholars have been concerned about it, and it has been applied in many fields in recent years [4–8]. With the development of the wavelet analysis technology, the researchers expect the wavelet analysis to be applied to SVM because wavelet analysis has excellent properties. In 2004, Zhang et al. combined wavelet analysis with SVM successfully, and they proposed the wavelet support vector machine (WSVM). The experimental results of WSVM show that the wavelet kernel is better than the Gaussian kernel, because wavelet kernel can approximate any functions. WSVM attracted wide attention immediately, and Khatibinia et al. [9] applied wavelet weighted least squares support vector machine to the seismic reliability assessment of RC structures including soil–structure interaction in 2013. Numerical results show that this algorithm has better efficiency and computational advantages in the seismic reliability assessment. Although the introduction of wavelet kernel improves the performance of SVM greatly, SVM still have some deficiencies and limitations. In order to overcome these shortcomings, Fung and Mangasarian [10] proposed proximal support vector machine (PSVM) in 2001. To make the calculation simpler, PSVM uses the equality constraints instead of the inequality constraints used in traditional SVM. Based on the study of the PSVM, Mangasarian et al. [11] proposed PSVM based on generalized eigenvalues proximal support vector machine (GEPSVM) in 2006. GEPSVM cancels the constraint that the two hyperplanes must be parallel in PSVM. GEPSVM makes each class sample points as close as possible to its hyperplane and as

123

1242

far away as possible from the other class sample points. Thereafter, based on the study of PSVM and GEPSVM, Jayadeva et al. [12] proposed twin support vector machine (TWSVM). TWSVM solves a hyperplane for each class sample points. The two hyperplanes in TWSVM have no constraint on the parallel condition. The binary classification problem is converted to two smaller quadratic programming problems by TWSVM. Because TWSVM has the solid theoretical foundation and the superiority of solving problems, many scholars contribute to the study of TWSVM, and there have been many achievements, for example, Wang et al. [13] proposed A GA-based model selection for smooth twin parametric-margin support vector machine in 2013. They increased the efficiency of TPMSVM from two aspects. First, by introducing a quadratic function, they directly optimized a pair of QPPs of PMSVM in the primal space. It can obviously improve the training speed without loss of generalization. Second, a genetic algorithm GA-based model selection for STPMSVM in the primal space was suggested. In 2013, Chen et al. [14] proposed Laplacian smooth twin support vector machine for semisupervised classification. Rather than solving two QPPs in dual space, they converted the primal constrained QPPs of Lap-TSVM into unconstrained minimization problems (UMPs). Then, a smooth technique was introduced to make these UMPs twice differentiable and a fast Newton–Armijo algorithm was designed to solve the UMPs in Lap-STSVM. In 2013, Qi et al. [15] proposed a new structural twin support vector machine (called S-TWSVM). Unlike existing methods based on structural information, S-TWSVM used two hyperplanes to decide the category of new data, of which each model only considers one class’s structural information. Peng et al. [16] proposed a robust minimum class variance twin support vector machine (RMCV-TWSVM). RMCV-TWSVM effectively overcomes the shortcoming in TWSVM by introducing a pair of uncertain class variance matrices in its objective functions. Huang et al. [17] proposed primal least squares twin support vector regression (PLSTSVR) in 2013. Ding et al. [18] formulated a nonlinear version of the recently proposed LSPTSVM for binary nonlinear classification by introducing nonlinear kernel into LSPTSVM and leaded to a novel nonlinear algorithm, called nonlinear LSPTSVM (NLSPTSVM) in 2014. However, as the same as SVM, the kernel function selection still affects the performance of TWSVM directly. The common kernel function of TWSVM is Gaussian radial basis kernel function currently. A lot of experimental results demonstrate that the effect of its application in TWSVM is good, but its generalization ability is relatively poor. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of

123

Neural Comput & Applic (2014) 25:1241–1247

transient signals. The wavelet kernel function based on wavelet analysis can approximate any nonlinear functions. From the application of wavelet kernel in SVM, we can see that wavelet kernel has good performance and the application in many fields also has a good effect. Therefore, TWSVM combined with the wavelet analysis technique is worthy of our research and analysis. Based on this, we propose wavelet twin support vector machine (WTWSVM). We use the wavelet kernel function to replace the original Gaussian radial basis kernel function. Shown in the experimental part of this paper, WTWSVM is feasible. It has improved the classification accuracy and generalization ability of TWSVM greatly, and it also expands the range of the kernel function selection in TWSVM. The rest of this paper is organized as follows: Section II briefly describes the mathematical model of TWSVM. Section III describes the wavelet kernel function and proposes the WTWSVM. Section IV analyzes the experiment results. Finally, we summarize and conclude the paper.

2 Twin support vector machine 2.1 .The mathematical model of TWSVM We assume that there are l training samples in the space of Rn, and they all have n attributes. m1 samples of them are the positive class, and m2 samples of them are the negative class. We use the matrix of A(m1 9 n) and the matrix of B(m2 9 n) to represent them, respectively. Finding two non-parallel hyperplanes in the space of Rn is the solving process of TWSVM when it is linear separable: xT w1 þ b1 ¼ 0 and xT w2 þ b2 ¼ 0

ð1Þ

In the nonlinear separable case, we need to introduce the kernel function K(xT,CT). At this time, the two hyperplanes of TWSVM are as follows: KðxT ; CT Þw1 þ b1 ¼ 0 and KðxT ; CT Þw2 þ b2 ¼ 0

ð2Þ

We construct the solution of these problems by following formulas: 2 1 min KðA; C T Þw1 þ e1 b1  þc1 eT2 f 2     s:t  K B; C T w1 þ e2 b1 þ f  e2 ; f  0; 2  1  min K B; C T w2 þ e2 b2  þc2 eT1 f 2     s:t  K A; C T w2 þ e1 b2 þ f  e1 ; f  0;

ð3Þ ð4Þ ð5Þ ð6Þ

In the above formula, CT = [A B]T, e1 is the unit column vector which has the same number rows with the kernel function of K(A,CT), e2 is the unit column vector which has

Neural Comput & Applic (2014) 25:1241–1247

1243

    1 wð xÞ ¼ 1  x2 exp  x2 2

8 6

ð10Þ

Lemma 2 [20] The Mexican Hat wavelet kernel function that satisfies the translation-invariant kernel conditions is  !   ! d Y x i  zi 2 1 xi  zi 2 K ðx; zÞ ¼ 1 exp  2 ai ai i¼1

4 2 0 -2

ð11Þ

-4 -6 -8 -10 -12

-10

-8

-6

-4

-2

0

2

4

6

8

Theorem 1 The following formula (12) is also a wavelet kernel function that satisfies the translation-invariant kernel conditions:  !  ! d  d  X xi  zi 2 1X xi  zi 2 K ðx; zÞ ¼ d  exp  2 i¼1 ai ai i¼1 ð12Þ

Fig. 1 The basic idea of TWSVM

the same number rows with the kernel function of K(B,CT), (1) (1) T (1) n is the slack vector, A = [x(1) 1 ,x2 ,…,xm1] ,B = [x1 ,(1) (1) T (i) x2 ,…,xm1] , xj represents the jth sample in the ith class, C1 and C2 are two penalty parameters. The distance between the test samples and the hyperplanes determines which class the test samples will be classified as. It means that if:     KðxT ; CT Þwr þ br ¼ min K xT ; CT wl þ bl  ð7Þ

A translation-invariant kernel is an admissible support vector kernel if and only if the Fourier transform is nonnegative. Proof of Theorem 1 According to the formula (12), we can get the Fourier transform of it is  d2 Z 1 F ½K ðxÞ ¼ expfjðhx; xiÞgK ð xÞdx ð13Þ 2p Rd 

l¼1;2

x belongs to the rth class and re{1,2}. In the two-dimensional case, Fig. 1 visually expresses the basic idea of twin support vector machine. In Fig. 1, the two lines represent the two classified hyperplanes and the purple dots and green dots represent the training points of class 1 and class-1, respectively.

F½KðxÞ ¼

1 2p 1 2p

i¼1

!

a

)

d  2 1X xi dx1 . . .dxd 2 i¼1 a

3  1 xi 2 5 ¼ expðjxi xi Þ exp  dxi 2 a i¼1 R ( ) d Z d  2 x 2 X 1X xi i dx1 . . .dxd  expðjhx  xiÞ exp  a 2 i¼1 a i¼1 d2

2

d  2 X xi

4d



d Z Y

Rd

Lemma 1 [19] Let W(x) be a mother wavelet. Let a and b denote the dilation and translation, respectively. If x, z e Rd, then the dot-product wavelet kernel is     d Y x i  bi z i  bi w K ðx; zÞ ¼ w ð8Þ ai ai i¼1 And the translation-invariant wavelet kernel is   d Y x i  zi K ðx; zÞ ¼ w ai i¼1

expfjðhx; xiÞg d 

Rd

 exp  1 2p

3.1 Wavelet kernel function

d2 Z (



3 Wavelet kernel function and WTWSVM

expfjðhx; xiÞgK ð xÞdx

Rd

 ¼

d2 Z

ð9Þ

In order to construct the translation-invariant wavelet kernel function, we select the Mexican Hat wavelet function as the mother wavelet. It is

d

¼ dð2pÞ2 ad e

12

d P

x2i

i¼1



d X

d

12

ð2pÞ2 ad e

d P i¼1

x2i 

1  x2i



i¼1 d 2

d

12

¼ ð2pÞ a e

d P i¼1

x2i

d X

x2i

i¼1

If a C 0, F[K](x) C 0. So K(x,z) is admissible support vector kernel. h Figures 2 and 3 are the effect diagrams of Mexican Hat wavelet kernel function at the test point of 0.1 when d = 1, and a has the different values. From Fig. 2, we can visually see that the learning ability of the Mexican Hat wavelet kernel function is very good, especially when a has smaller values. We can also see that

123

1244

Neural Comput & Applic (2014) 25:1241–1247

kernel output

1

a1=1 a2=2 a3=3 a4=4 a5=5 a6=6

0.5

0

-0.5 -10

-8

-6

-4

-2

0

2

4

6

8

10

input

Fig. 2 The graph of Mexican Hat wavelet kernel function (d = 1 and a with the smaller values) at the test point of 0.1

kernel output

1

0.5

kx  x i k2 K ðx; xi Þ ¼ exp  2r2

! ð14Þ

Although the learning ability of the Gaussian radial basis kernel function is very strong, its generalization ability is relatively weak. This directly affects the performance of TWSVM. Many algorithms have optimized the parameters of the Gaussian radial basis kernel function to improve its performance, but they cannot fundamentally solve the kernel function selection problem in TWSVM. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The wavelet kernel function based on wavelet analysis can approximate any nonlinear functions. So the combination of the wavelet analysis technique and TWSVM has important significance, and it is worth our study and analysis. Based on this, we propose the WTWSVM. The essence of WTWSVM is the wavelet kernel function instead of the Gaussian radial basis kernel function used in traditional TWSVM. It means that K(XT,CT) in the mathematical model of TWSVM is  !  ! d  d  X xi  zi 2 1X xi  zi 2 K ðx; zÞ ¼ d  Þ exp  2 i¼1 ai ai i¼1 ð15Þ

0

-0.5 -10

This algorithm introduces the wavelet kernel function into TWSVM, and this makes the combination of wavelet analysis techniques and TWSVM come true. WTWSVM improves the performance of TWSVM fundamentally, and it also expands the range of the kernel function selection in TWSVM to promote the further development of TWSVM.

a1=5 a2=10 a3=20 a4=25 a5=35 a6=45 -8

-6

-4

-2

0

2

4

6

8

10

input

Fig. 3 The graph of Mexican Hat wavelet kernel function (d = 1 and a with the larger values) at the test point of 0.1

the greater the value of a is the smoother the curve will be when a has greater values from Fig. 2. This indicates that it has a better generalization ability when a has greater values. It is very important to determine the parameters in the actual data. For different data sets, a will have different values. In summary, the wavelet kernel function will have a better learning ability and generalization ability if we can select the appropriate parameters. 3.2 Wavelet twin support vector machine The traditional TWSVM uses the Gaussian radial basis kernel function. It means that K(XT,CT) in the mathematical model of TWSVM is

123

3.3 The flow of WTWSVM The algorithm steps of WTWSVM are as follows: Step 1 Import data sets and divide each date set into two randomly. One is 80 % of the data set which are used for training, and the other is 20 % of the data set which are used for testing. Step 2 Initialize the relevant parameter values of the algorithm. Step 3 It takes the 80 % of the data for training. In order to solve the classification plane, the wavelet kernel function maps the data into a high-dimensional feature space to make them become linear separable. Determine the value of a and the value of C1, C2 in TWSVM by the grid searching method. Step 4 Calculate the classification accuracy with the parameter values from Step 3.

Neural Comput & Applic (2014) 25:1241–1247

1245

start

Import data sets and divide each date set into two randomly.

Initialize the relevant parameter values of the algorithm.

Take the 80% of the data for training to get the parameter values . No

Calculate the classification accuracy with the parameter values.

Is this accuracy the global optimum ?

No

Does it reach the end condition of the grid cycle.

Yes

Bring the optimal parameters into TWSVM to determine the final model of WTWSVM.

Take the remaining 20% of the data for testing to get the test classification accuracy.

Yes

Update the global optimum value and record these parameters.

end Fig. 4 The algorithm flowchart of WTWSVM

Step 5 Determine whether this classification accuracy is the global optimum accuracy. If it is the global optimum accuracy, Jump to the Step 6. If it is not the global optimum accuracy, Jump to the Step 7. Step 6 Update the global optimum value and record these parameter values as the optimal parameter values. Step 7 Determine whether it reaches the end condition of the grid cycle. If it does not, Jump to the Step 3. If it does, Jump to the Step 8. Step 8 Bring the optimal parameters got from the training into TWSVM. And then the final model of WTWSVM is determined. Step 9 After the algorithm model being determined, take the remaining 20 % of the data for testing to get the test classification accuracy.

Step 10

Stop operations.

The algorithm flowchart of WTWSVM is shown in Fig. 4. By this algorithm flowchart, we can intuitively understand the algorithm process of WTWSVM proposed by this paper. And the ten algorithm steps described above are expressed clearly in Fig. 4. This will help you to understand the WTWSVM. 4 The analysis of experimental results This paper selects nine common data sets in the UCI machine learning database to test and validate the algorithm proposed by this paper. The 80 % of the data will be used for training, and the remaining 20 % of the data will be used for testing. Since we want to verify that the wavelet kernel function will improve the performance of TWSVM,

123

1246

Neural Comput & Applic (2014) 25:1241–1247 100

Table 1 The data characteristics of the data sets The number of samples

The number of attributes

breast-cancer

votes

ionosphere

Data sets

95 sonar 90

208

60

Ionosphere

351

34

Votes

435

16

Haberman

306

4

Bupa

345

6

German

1,000

24

Pima-Indian Australian

768 690

8 14

Wisconsin breast cancer

699

10

85

Accuracy

Sonar

80

german

75

Haberman

Australian

Bupa Pima-Indian

70 65 60

WTWSVM TWSVM(Gaussian kernel•

55 1

2

3

4

5

6

7

8

9

Data sets

Table 2 The experimental results Data sets

Fig. 5 The effect diagram of experimental results

Accuracy WTWSVM (%)

TWSVM (Gaussian kernel) (%)

Ionosphere

97.18

92.96

Haberman

74.19

70.97

Votes Sonar

97.73 93.02

92.05 86.05

Bupa

73.91

60.87

Wisconsin breast cancer

97.87

95.04

German

76.5

70

Pima-Indian

74.68

73.74

Australian

79.14

75.8

we only do the nonlinear experiments. The nine data sets are ionosphere data set, Australian data set, Pima-Indian data set, Sonar data set, votes data set, Haberman data set, Bupa data set, Wisconsin breast cancer data set and German data set. Using the MATLAB environment, we do these experiments on a PC. In this algorithm, the parameter values are determined by grid searching method, and for different data sets, their values are different. The characteristics of the nine data sets in the experiment are shown in Table 1. In this paper, we do the experiments on Gaussian radial basis kernel function and wavelet kernel function, respectively. And then we compare their experimental results. By comparison, we can prove that the WTWSVM proposed by this paper is feasible. The experimental results are shown in Table 2. In order to visually observe the experimental results, the experimental results have been graphed to an effect diagram. It is shown in Fig. 5. From Fig. 5, we can visually see that the classification accuracy curve of WTWSVM in each data set significantly lies above the curve of TWSVM (Gaussian kernel). This

123

clearly shows that the classification accuracy of WTWSVM is higher than TWSVM (Gaussian kernel). Therefore, from Table 2 and Fig. 5, we can have the following conclusion: WTWSVM is feasible, and it improves the performance of TWSVM significantly. The reason for having such a good effect is that WTWSVM proposed by this paper has used the wavelet kernel function. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The introduction of wavelet technology improves the classification performance of TWSVM, and the generalization ability of TWSVM also has been improved definitely. But this experiment still has some shortcomings. It is that we use the grid searching method to find the optimal parameters in this experiment. This method is relatively inefficient, and it often cannot find the optimal parameters. This shortcoming is worthy to be solved in further studies. However, for the advantages of the algorithm proposed by this paper, this problem is negligible. The successful use of the wavelet kernel function in WTWSVM not only improves the classification accuracy and performance of TWSVM, but also expands the range of the kernel function selection in TWSVM. This is beneficial to the further development of TWSVM.

5 Conclusion The classification algorithm has been developed rapidly in recent years, and the twin support vector machine as an excellent machine learning method is also a research hot spot in the field of machine learning in recent years. But the TWSVM also has some problems and shortcomings. For the kernel function selection problem in TWSVM, we

Neural Comput & Applic (2014) 25:1241–1247

propose the WTWSVM in this paper. This algorithm makes use of the features of wavelet analysis and applies the wavelet analysis techniques to the twin support vector machine. This has greatly improved the performance of twin support vector machine, and it also expands the range of the kernel function selection in TWSVM to further broaden the research direction of TWSVM. It will play a significant role in promoting the further development of TWSVM. However, the parameters of the algorithm in this paper are determined by the grid searching method, so the efficiency of this method is relatively low, and it is difficult to find the optimal parameters. Therefore, in order to further improve the performance of the algorithm, we can start from this point to continue to optimize the algorithm in the next research work to further improve the classification accuracy of TWSVM. Acknowledgments This work is supported by the National Natural Science Foundation of China (No. 61379101) and the National Key Basic Research Program of China (No. 2013CB329502).

References 1. Cristianini N, Taylor JS (2004) An introduction to support vector machines and other kernel-based learning methods (trans: Li G, Wang M, Zeng H). Electronic Industry Press, Beijing 2. Ding S, Qi B, Tan H (2011) An overview on theory and algorithm of support vector machines. J Univ Electron Sci Technol China 40(1):2–10 3. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(2):273–297 4. Anton JCA, Nieto PJG, Viejo CB, Vilan JAV (2013) Support vector machines used to estimate the battery state of charge. IEEE Trans Power Electron 28(12):5919–5926 5. Nascimbem LBLR, Rubini BR, Poppi RJ (2013) Determination of quality parameters in moist wood chips by near infrared spectroscopy combining PLS-DA and support vector machines. J Wood Chem Technol 33(4):247–257

1247 6. Deng SG, Xu YF, Li L et al (2013) A feature-selection algorithm based on support vector machine-multiclass for hyperspectral visible spectral analysis. J Food Eng 119(1):159–166 7. Hu LS, Lu SX, Wang XZ (2013) A new and informative active learning approach for support vector machine. Inf Sci 244:142–160 8. Yaman S, Pelecanos J (2013) Using polynomial kernel support vector machines for speaker verification. IEEE Signal Process Lett 20(9):901–904 9. Khatibinia M, Javad Fadaee M, Salajegheh J, Salajegheh E (2013) Seismic reliability assessment of RC structures including soil-structure interaction using wavelet weighted least squares support vector machine. Reliab Eng Syst Saf 110:22–33 10. G Fung, OL Mangasarian (2001) Proximal support vector machine classifiers. In: Proc 7th ACMSIFKDD Intl Conf on Knowledge Discovery and Data Mining pp 77–86 11. Mangasarian OL, Wild Edward W (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74 12. Khemchandni R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910 13. Wang Z, Shao YH, Wu TR (2013) A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recognit 46(8):2267–2277 14. Chen WJ, Shao YH, Hong N (2013) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern pp 1–10 15. Qi Z, Tian Y, Shi Y (2013) Structural twin support vector machine for classification. Knowl Based Syst 43:74–81 16. Peng X, Xu D (2013) Robust minimum class variance twin support vector machine classifier. Neural Comput Appl 22(5):999–1011 17. Huang H, Ding S, Shi Z (2013) Primal least squares twin support vector regression. J Zhejiang Univ Sci C 14(9):722–732 18. Ding S, Hua X (2014) Recursive least squares projection twin support vector machines. Neurocomputing 130:3–9 19. Zhang L, Zhou W, Jiao L (2004) Wavelet support vector machine. IEEE Trans Syst Man and Cybern Part B (Cybern) 34(1):34–39 20. Zhang X, Gao D, Zhang X, Ren S (2005) Robust wavelet support vector machine for regression estimation. Int J Inf Technol 11(9):35–45

123