Soft Comput DOI 10.1007/s00500-012-0954-x
METHODOLOGIES AND APPLICATION
Large data sets classification using convex–concave hull and support vector machine Asdru´bal Lo´pez Chau • Xiaoou Li • Wen Yu
Springer-Verlag Berlin Heidelberg 2012
Abstract Normal support vector machine (SVM) is not suitable for classification of large data sets because of high training complexity. Convex hull can simplify the SVM training. However, the classification accuracy becomes lower when there exist inseparable points. This paper introduces a novel method for SVM classification, called convex–concave hull SVM (CCH-SVM). After grid processing, the convex hull is used to find extreme points. Then, we use Jarvis march method to determine the concave (non-convex) hull for the inseparable points. Finally, the vertices of the convex–concave hull are applied for SVM training. The proposed CCH-SVM classifier has distinctive advantages on dealing with large data sets. We apply the proposed method on several benchmark problems. Experimental results demonstrate that our approach has good classification accuracy while the training is significantly faster than other SVM classifiers. Compared with the other convex hull SVM methods, the classification accuracy is higher.
1 Introduction Support vector machine (SVM) is a highly desirable classification method, because it offers a hyperplane that
Communicated by V. Loia. A. L. Chau X. Li Departamento de Coputacion, CINVESTAV-IPN, Mexico City 07360, Mexico W. Yu (&) Departamento de Control Automatico, CINVESTAV-IPN, Mexico City 07360, Mexico e-mail:
[email protected] represents the largest separation (or margin) between the two classes (Hsu et al. 2010). The hyperplane maximizes the distance between the nearest data points of each side. However, this kind of maximum-margin hyperplane may not exist because of mislabeled examples (linear/nonlinear inseparable). The soft margin SVM can choose a hyperplane that splits the examples as clearly as possible by introducing slack variables (Vapnik 1995). Another problem of SVM classification is that it needs to solve the quadratic programming (QP), which causes an intensive computational complexity. The normal SVM is computational infeasible on large data sets, because it has O(m3) time and O(m2) space complexities with the training set size m (Cristianini and Shawe-Taylor 2000). Many researchers have tried to find possible methods to apply SVM on large data sets classification. Existing approaches generally can be divided into two types: modifying SVM classifiers and reducing training data sets. The sequential minimal optimization (SMO) (Platt 1998) breaks the large QP problem into a series of small possible QP problems. The projected conjugate gradient (PCG) scales somewhere between linear and cubic in the training set (Collobert and Bengio 2001). Clustering is an effective tool to reduce data set size, for example, hierarchical clustering (Yu et al. 2003) and parallel clustering (Pizzuti and Talia 2003). Another method of reducing training data is to use the geometric properties of SVM (Franc and Hlavac 2003). In separable case, the maximum-margin hyperplane is equivalent to finding the nearest neighbors in the convex hulls of each class (Bennett and Bredensteiner 2000b). The nearest points problem (NPP) is reformulated to solve SVM classification (Keerthi et al. 2001). Gilbert’s algorithm (1966) is one of the first algorithms for solving the minimum norm problem (MNP) in NPP.
123
A. L. Chau et al.
The MDM algorithm suggested by Mitchell et al. (1971) works faster than Gilbert’s algorithm. The NPP method is also extended to solving the SMO-SVM optimization problem of image processing (Keerthi and Gilbert 2002). Since the support vectors are usually located in local extremum or near extremum (Bennett and Bredensteiner 2000a), using extremes’ examples from the full training set allows reducing training set (Guo and Zhang 2007). The key problem is how to find the border of a data set. K-nearest neighbor (KNN) method is used to find boundary points (Xia et al. 2006; Li 2011), where local data distribution and local geometrical analysis were considered. The minimum enclosing ball (MEB) method in computational geometry is applied to SVM classification in Cervantes et al. (2008). Approximately optimal solutions of SVM are obtained by core vector machine (CVM) algorithm via MEB and the idea of core sets in Tsang et al. (2005). Schlesinger-Kozinec (SK) algorithm (1981) used the geometric interpretation of SVM and finds the maximal margin classifier with a given precision for separable data. Convex hull is widely applied in reducing training data for SVM classification. In computational geometry, a number of algorithms are known for computing the convex hull for a finite set of points. Graham scan (1972) found all vertices of the convex hull ordered along the boundary by computing the direction of the cross product of the two vectors. The Jarvis march (1973) identified the convex hull by angle comparisons and wrapping a string around the point set. Divide and Conquer method (Preparata and Hong 1977) was applicable to multi-dimensional case. The incremental convex hull (Kallay 1984) and quick hull (Eddy 1977) algorithms consisted of eliminating some useless points. In many real applications, the data are not perfectly separated, and the kernel methods are not so powerful for nonlinear separation. The closest points in the convex hulls are no longer support vectors. In this case, the soft margin optimization of SVM can be applied directly to the inseparable sets of the convex hulls. The penal parameter affects the optimal performance of SVM. The optimization becomes a trade off between a large margin, and a small error penalty (Cristianini and Shawe-Taylor 2000). The intersection parts of the convex hulls by the reduced convex hull (Mavroforakis and Theodoridis 2006) disappeared, such that the inseparable case becomes separable case. The key disadvantage of the reduced convex hull is the convex hull has to be calculated in each reducing step (Bennett and Bredensteiner 2000b). The variant of the SVM (v-SVM) (Crisp and Burges 2000) classifiers maximal separation between two convex hulls by choice of parameters, such that the intersection of the convex hulls is empty. The v-SVM is similar to the reduced convex hull, the computation process is complex.
123
The concave-convex procedure (CCP) (Yuille and Rangarajan 2003) separates the energy function into a convex function and a concave (non-convex) function. By using a non-convex loss function, it forms a non-convex SVM. But some good properties of SVM, for example the maximum margin, cannot be guaranteed (Collobert et al. 2006), because the intersection parts of data sets are not satisfied convex conditions. In this paper, we propose a new algorithm to search the border points, called convex–concave hull (CCH). By using the Jarvis march method (1973), we first find the convex hulls of the data set. Then, concave hulls are formed by the vertices of the convex hull. In this way, the misleading points in the intersection become the vertices of the concave hulls. The classification accuracy is increased a lot compared with the above methods whereas training time is decreased considerably. Another contribution of the paper is that we extend our two-dimensional convex hull SVM to multi-dimensional case using a projection method. The experimental results show that the accuracy obtained by our convex–concave hull SVM (CCH-SVM) is very close to classic SVM methods, while the training time is significantly shorter.
2 Convex–concave hull 2.1 Vertices of convex–concave hull Normally, a border point is defined as (Xia et al. 2006) follows: 1.
2.
A border point is not an outlier in the sense that its density is bigger than a threshold d, i.e., if x is a border point then Density(x) [ d. The border points should be close to many other points, while the outliers are not. There exists a region R2 near to a border point R1, such that Density(R2) Density(R1). This means there is at least one region (named inside region) near from the border point, its density is bigger than the density of the border point.
The above definitions are useful to understand the concept of boundaries. However, they do not help to directly build a method to detect the border points. In this paper, to apply convex–concave hull we use different definitions. The border points BðXÞ of a set X 2 R2 are vertices of a ‘‘convex–concave hull’’, if they hold the following properties. 1.
The vertices of a convex–concave hull BðXÞ are the vertices of a convex hull (CH) and the points that are closed to the edges of CH
Large datasets classification using CCH and SVM 1
0.9 Point in dataset
data1 CH Vertex CH C-C Hull
Convex Hull (CH)
0.9
CH vertex
0.8
No Convex Hull (NCH)
0.8
CNCH vertex
0.7
0.6
0.6
0.5
0.5
X2
X2
0.7
0.4
0.4 0.3
0.3 0.2
0.2
0.1 0
0
0.2
0.4
0.6
0.8
0.1
1
X1 0 0.2
Fig. 1 The convex hull and a convex–concave hull of a set of points
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X1
BðXÞ ¼ VCHðXÞ [ CloseCH
ð1Þ
where VCHðXÞ is vertex set of the convex hull, CloseCH is the set that is closed to edges of the convex hull. The ‘‘closeness’’ to an edge of CHðXÞ is different from point to point, see Fig. 1. So there is no unique set of border points. If the interior angles of the vertices of CHðXÞ are strictly convex, they are called extreme points. 2. The BðXÞ is a non-convex hull. A set S 2 Rn is said to be convex CHðXÞ if x1 ; x2 2 S; then ax1 þ ð1 aÞx2 2 S; for all a 2 ð0; 1Þ
3.
Generally, a polygon defined by (1) does not hold (2). The border points must be part of the data set BðXÞ X;
4.
X2
maxjBðXÞj ¼ jXj
8 X:
ð3Þ
The minimum size of BðXÞ is defined by (4) min jBðXÞj ¼ jvertices CHðXÞj:
ð4Þ
The convex hull CH is the minimum convex set containing the points
A
1 0 -1 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
X1
B
X2
1 0 -1 -1
-0.8
-0.6
-0.4
-0.2
0
X1
C
1
X2
Fig. 3 Computing convex– concave hull on two partitions of a set of points
ð2Þ
Fig. 2 An example of convex–concave hull searching
0 -1 -1
-0.8
-0.6
-0.4
-0.2
0
X1
123
A. L. Chau et al.
CHðXÞ : w ¼
n X
ai xi ; ai 0;
i¼1
n X
3.
ai ¼ 1; xi 2 X
i¼1
ð5Þ In the following subsection, we describe a method to detect the vertices of a convex–concave hull (Fig. 2).
2.2 Convex–concave hull search Given a set of points X ¼ fx 2 R2 g; we need to find its extreme points by computing the CHðXÞ: For two adjacent vertices v1 and v2 in the convex hull CHðXÞ; it is clear that all points in X must be located on one side of the line v1 - v2. Considering an edge E formed by two adjacent vertices v1 and v2 of CHðXÞ; we use them as reference points to detect the concave hull BðXÞ: The procedure is summarized in the next three steps. 1. 2.
Compute the convex hull CHðXÞ Search points close to each edge of CHðXÞ
123
Repeat Step 1 and Step 2 until all local neighbors are included.
We use Algorithms 1 and 2 to detect the points closed to the edges of CHðXÞ: Figure 3 shows an example of the convex–concave hull BðXÞ computed via Algorithm 1 and Algorithm 2. It is possible to obtain a different set BðXÞ by changing the parameter K in Algorithm 2 .
Remark 1 Algorithm 2 shows how the space between two consecutive extreme points is explored. The underlying idea is similar to Jarvis’ march (1973), where a set of points is wrapped. However, Jarvis’ march considers all points, whereas Algorithm 2 uses only local neighbors. Algorithm 2 runs until the second extreme point in the argument is reached. Remark 2 The convex–concave hull has the following advantages over KNN concave hull (Moreira and Santos 2007): (1) The starting and stopping points of our algorithms are set before by convex hull method. It ensures all vertices of convex hull are always in the vertices of
Large datasets classification using CCH and SVM
convex–concave hull. (2) The angles in (Moreira and Santos 2007) are entirely depended on the previously computed ones. In our method the extreme points are used to compute the angles. It is an easy concurrent implementation. (3) Our algorithm does not use recursive invocations. The stop point is added to the set of border points when it cannot find more. This saves the detection time. 2.3 Properties of convex–concave hull In this section, we propose three interesting properties of our convex–concave hull methods. These properties can be used as the base to formulate a simple pre-processing step, which allows a faster computation of exterior boundaries of the points X. The properties of convex–concave hull are as follows: Property 1: The set of vertexes of convex–concave hull is a super-set of the vertexes of convex hull VCHðXÞ BðXÞ Explanation. The searching algorithm first computes CHðXÞ and then uses two extreme points v1,v2 from CHðXÞ to search concave points BðXÞ: At the last iteration, BðXÞ contains all vertices of CHðXÞ: Remark 3 The property 1 ensures that the optimal separating hyperplane is obtained in linearly separable cases if BðXÞ is used for SVM training. Property 2: There exists a nonnegative integer K such that BðXÞ ¼ V CHðXÞ Explanation. Parameter K 2 N controls the candidate number in the convex–concave hull. If K is chosen large enough, all elements in X are in BðXÞ; and a global search of candidates is needed. We start from a point v1 2 CHðXÞ: If it does not intersect with the current hull, there must exist a nearest extreme point v2 in X. So a large enough K guarantees this property. Property 3: A super-set of the convex–concave hull of X can be obtained by detecting border points BðXÞ on partitions of X, i.e., [ BðX i Þ BðXÞ CHðXÞ i
where \CHðX i Þ ¼ £ and [ Xi = X. Explanation. Let us create j partitions on the input space X 2 R2 ; i.e., \X i ¼ £ and
[ Xi ¼ X
Suppose that each partition Xi, i = 1, …, j contains at least three points that belong to X. It is not difficult to see that the vertices of convex hull VCHðXÞ are a subset of [VCHðXi Þ :
We use Fig. 3 to explain the idea. The convex hull CHðXÞ is Fig. 3b which is from the original data set X in Fig. 3a. The two partitions are defined as CHðX1 Þ and CHðX2 Þ: It is clear that VCHðX1 Þ [ VCHðX2 Þ VCHðXÞ Applying Algorithm 1 and 2 on X1 and X2 produces two convex–concave hulls which are shown in Fig. 3c. Obviously, the join of them is a super-set of the convex– concave hull of X, i.e., BðXÞ BðX1 Þ [ BðX2 Þ Remark 4 All extreme points in X must be included in BðXÞ: If the convex–concave hull search algorithm is applied on the disjoint partitions of X, the resulting data set must contain all the vertices of convex–concave hull determined from the whole of X. 2.4 Data set pre-process with grid algorithm 2.4.1 Grid algorithm The convex–concave hull searching algorithms work well if the data set X has a uniform distribution. However, the distribution of a data set is unknown in advance. In order to avoid this problem, we partition the original data, then convex–concave search is applied on each partition. The distribution in each partition is more uniform than in whole data. Instead of using costly algorithms to detect optimal partitions, we only use the groups of adjacent cells for the grid method. The grid method to pre-process data consists of the following steps: 1. 2. 3.
Create a number of partitions Xi. Repetitively apply convex–concave hull searching on each partition Xi. Join the points recovered in the previous step.
The Properties 1 to 3 assure that all border points BðXÞ will be captured. In linearly separable case, from Property 3 we know that the support vectors from this grid preprocessing are included in the vertices of convex hull VBðXÞ VCHðXÞ In addition, the points in intersection of convex hulls are also included in BðXÞ: This is helpful for linearly inseparable cases for SVM classification. In the next subsection, we explain how the grid is created. 2.4.2 Partitions The first step of the data pre-process should be fast to avoid the bottleneck problem in the border point detection. Here, we use a binary tree to partition data. All points in X are
123
A. L. Chau et al. 1 0.9 0.8 0.7
X2
0.6 0.5 0.4 0.3 0.2 xi ∈ X+ CH(X +)
0.1
CCH(X+)
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X1
Fig. 6 Result of data set pre-process, hg = 0 Fig. 4 Grid G (hg = 2) in a binary tree
1 0.9 0.8 0.7
X2
0.6 0.5 0.4 0.3 0.2 xi ∈ X+ CH(X +)
0.1
CCH(X+)
Fig. 5 Structure of grid G with granularity = 2
0
0
0.2
0.4
0.6
0.8
1
X1
first mapped into the grid G by a binary tree with the height hg, see Fig. 4. The height of the tree is the granularity of G: It decides how many times one dimension should be halved. The grid mapping from X into G also reduces the repeated points. If a set of points in X are very close to each other, they are mapped into the same cell in G: A twodimension grid G is shown in Fig. 5. There the features X1 and X2 are halved twice. The granularity is 2. This binary tree data structure is shown in Fig. 4, where the leaves represent the cells of the grid G: Now, we use two examples to show how our grid method works. Figures 6 and 7 show the results of preprocessing with granularity = 9. Figure 6 shows that the
123
Fig. 7 Result of data set pre-process, hg = 1
convex–concave hull uses the whole data points (hg = 0). Figure 7 shows that the searching algorithm uses hg = 1. The border point joining is realized by the union set operation. 2.5 SVM classification via convex–concave hull Basically, SVM classification can be grouped into two types: linearly separable and linearly inseparable cases. The nonlinear separation can be transformed into linear case via a suitable kernel. In linear separable case
Large datasets classification using CCH and SVM
generally located on the exterior boundaries of data distribution, they are not vertices of CHðX þ Þ and CHðX Þ (Berg et al. 2008). On the other hand, the vertices of the convex–concave hull are the border points and they are possible to be the support vectors (Moreira and Santos 2007). After the vertices of the convex–concave hull BðXÞ ¼ VCHðXÞ [ VNCHðX Þ have been detected, they are used to train a SVM classifier. Finding an optimal hyperplane is equivalent to solve the following QP problem (primal problem) n X 1 min J ðwÞ ¼ wT w þ C nk w;b 2 ð6Þ k¼1 T subject : yk w uðxk Þ þ b 1 nk Fig. 8 Linealy separable case, hg = 0
CHðX þ Þ \ CHðX Þ ¼ £ where X? and X- represent the elements in X with label ?1 or -1, respectively. It has been demonstrated that if the data set is linearly separable, then the support vectors must be the vertices of CHðX þ Þ and CHðX Þ (Berg et al. 2008). Although our convex–concave hull algorithms are suitable for the linearly inseparable case, they can also successfully deal with the linearly separable case by Properties 1–3, see Fig. 8. In linearly inseparable case, the convex hulls CHðXÞ are intersected, see Fig. 9. Because support vectors are
where nk is a slack variable to tolerate mis-classifications n k [ 0, k ¼ 1. . .n; c [ 0, wk is the distance from xk to the hyperplane [ wTu(xk) ? b] = 0, u(xk) is a nonlinear function. Remark 5 In our convex–concave hull SVM classification, the penalty factor C can be very small, because almost all misleading points are removed by the convex–concave hull search algorithms. This is why the classification accuracy is improved with our CCH-SVM. The kernel which satisfies the Mercer condition (Vapnik 1995) is K(xk, xi) = u(xk)Tu(xi). (6) is equivalent to the following QP problem which is a dual problem with the Lagrangian multipliers ak C 0,
Fig. 9 Linearly inseparable case, hg = 0
123
A. L. Chau et al.
max J ðaÞ ¼ a
subject :
n X
n n X 1X y k y j K x k ; x j ak aj þ ak 2 k;j¼1 k¼1
ak yk ¼ 0;
dk;x ¼ ð7Þ
0 ak c
k¼1
It is well known that most of solutions are zero, i.e., the solution vector is sparse. The non-zero ak is called a support vector. Let V be the index set of support vectors, then the optimal hyperplane is X ak yk K xk ; xj þ b ¼ 0: ð8Þ
2 !12 n i X x ðkÞ x j ximax ximin i¼1
ð9Þ
where n is the dimension of sample x, x j is the center of the jth cluster, ximax ¼ maxk fxi ðkÞg; and ximin ¼ mink fxi ðkÞg: The center of each cluster can be recursively computed as j xikþ1 ¼
k1 j 1 i x þ x ðkÞ k ik k
ð10Þ
where b is determined by Kuhn–Tucker conditions.
Algorithm 3 shows how to partition each dimension for a multi-dimensional data set. In this way, the multi-dimensional data set is divided into several two-dimensional training data sets, then Algorithm 1 and Algorithm 2 are used to search the convex–concave hull. The main advantage of Algorithm 3 is that the training time is linear with the size of the training data. Fig. 10 shows an example of the convex–concave hull in three dimensions.
2.6 Multi-dimensional case
3 Experiments
In order to extend convex–concave hull to more than two dimensions, the dimension reduction is necessary. We do not use the principal component analysis (PCA) as a common method to reduce dimension (Jolliffe 2002). However, it is not suitable for large data set. In this paper, we use clustering technique. The border samples in the two-dimensional subsets are formed into one. Then, they are projected back to their original dimension with respect to the fixed centers. The detail algorithm is shown in Algorithm 3. Here, we use our previous on-line algorithm (Yu and Li 2008) to compute the clusters. The basic idea of the on-line clustering is when the distance from a sample to the center of a group is less than a previously defined distance L, this sample belongs to this group. When new data are obtained, the center and the group should also change. The Euclidean distance at time k is defined as
We use three examples to compare our algorithms with the other four SVM classification methods: SMO (Platt 1998), LibSVM (Chang and Lin 2001), clustering-based SVM (CSVM) (Cervantes et al. 2008), and the reduced convex hull SVM (RCH-SVM) (Mavroforakis and Theodoridis 2006). The SMO, LibSV, CSVM, and RCH-SVM are trained with the original training set. The CSVM uses clustering and uses SVM twice, the SMO and LibSVM use decomposition approach. The RCH-SVM method finds the closest points within a reduced convex hull. Our convex–concave hull SVM (CCH-SVM) trains SVM with LibSVM using the detected border points. All experiments are run on a computer with the following features: Core 2 Duo 1.66 GHz processor, 2.5 GB RAM, Linux Fedora 15 operating system. The algorithms were implemented in the Java language. The maximum
k2V
The resulting classifier is " # X ak y k K ð x k ; x Þ þ b yðxÞ ¼ sign k2V
123
Large datasets classification using CCH and SVM Fig. 10 convex–concave hull in three dimensions
Table 1 Classification results of checkerboard data set (9 104) Data set
hg
K
Tbp
Ttr
#SV
amount of random access memory given to the Java virtual machine is set to 1.9 GB. In all cases the corresponding training times and achieved accuracy are measured and compared. The training and testing sets used in experiments were created by randomly splitting 70 and 30 % of data sets, respectively. The kernel used in all experiments is a radial basis function. The RBF kernel is chosen as
Acc
SMO(1)
–
–
–
928
171
–
Bð1Þ
7
25
587
241
101
768
99.7
SMO(5)
–
–
–
12,589
1,495
–
98.6
Bð5Þ
2
15
2,635
2,887
626
2,924
98.4
SMO(10)
–
–
–
43,443
1,252
–
99.8
99.8
Bð10Þ
7
25
3,557
3,094
770
5,319
99.4
SMO(20)
–
–
–
53,643
1,173
–
99.7
Bð20Þ
9
40
6,719
2,001
627
4,632
98.3
SMO(25)
–
–
–
69,009
6,839
–
98.7
Bð25Þ
4
15
6,501
7,181
1,237
13,171
98.4
ð x zÞT ð x zÞ f ð x; zÞ ¼ exp 2c2 Fig. 11 The checkerboard data set
#BS
! ð11Þ
where c = 0.055. The reported results are the average of 30 runs of each experiment. The parameter L in the 1 range of feature m. Algorithm 3 was set to L ¼ 10 3.1 Experiment 1 In this experiment, a version of the checkerboard data set was used (Ho and Kleinberg 1996). This version of checkerboard overlaps classes and contains more points than the original data set, see the Fig. 11.
123
A. L. Chau et al. Table 2 Classification results for checkerboard (40 9 103) Method
hg
K
Tbp
T
SMO
–
–
–
11,243
648
–
97.3
LIBSVM
–
–
–
7,130
540
–
97.8
CSVM
–
–
–
2,830
310
–
97.6
RCHSVM
–
–
–
19,325
350
–
90.6
CCHSVM
4
7
437
1,990
253
2,420
96.2
# BS
Acc
tr
#SV
#BS
Acc
Table 3 Classification results of spheres (9 104) Method
hg
K
Tbp
Ttr
# SV
SMO(25)
–
–
–
9,575
1,523
–
98.6
Bð25Þ
2
15
1,957
2,001
626
2,924
98.4
SMO(225)
–
–
–
47,931
7,325
–
98.7
Bð225Þ
4
15
6,501
711
925
13,171
98.4
SMO(31)
–
–
–
991
275
–
99.8
Bð3Þ
7
25
541
212
156
657
99.7
SMO(35)
–
–
–
9,241
511
–
99.9
Bð5Þ
–
25
2,552
1,661
360
4,590
99.5
SMO(310)
–
–
–
38,438
1,952
–
99.8
Bð310Þ
–
25
4,771
3,914
569
3,210
99.4
SMO(420) Bð420Þ
– –
– 40
– 5,929
49,398 2,230
1,235 440
– 3,531
99.7 98.3
In order to see how the training data size affects the training time and classification accuracy of the CCH-SVM, we use checkerboard of size 10 9 103, 50 9 103, 100 9 103, 200 9 103, and 250 9 103 to train CCH-SVM and SMO.
The comparison results are shown in Table 1. Here, SMO(m 9 103) and Bðm 103 Þ means that m 9 1,000 points have used to train SMO and CCH-SVM, respectively. The hg and K are CCH-SVM algorithm parameters defined in Sect. 2, Ttr is the time to compute border points BðXÞ; Ttr is the training time, #SV is the number of support vectors, #BS is the number of vertices of the convex– concave hull, and Acc is the classification accuracy. It can be seen that CCH-SVM invests lesser time than SMO to train SVM. The classification accuracy is almost the same than the obtained with SMO. When the data size in increased, the training time is dramatically increased with SMO, while CCH-SVM only increases a little. Although the classification accuracy cannot be improved significantly when data size is very large, the testing accuracy is still acceptable. Now, we compare our CCH-SVM with SMO (Platt 1998), LibSVM (Chang and Lin 2001), CSVM (Cervantes et al. 2008), and RCH-SVM (Mavroforakis and Theodoridis 2006), and we use checkerboard set with 40 9 103 poins. The comparison results are shown in Table 2. Compared with the other SVM training methods, the CCHSVM achieves good classification accuracy while the training is significantly faster. 3.2 Experiment 2 In order to get general conclusions, we use Spheres n data set, which is synthetic and consists of several highdimensional balls with random radii and centers.
Fig. 12 Three dimensional spheres n data set
30 20
X3
10 0
-10 -20 -30 30 20
30
10
20 0
X2
10 0
-10
-10
-20
-20 -30
123
-30
X1
Large datasets classification using CCH and SVM Table 4 Classification results of spheres n data set (44 9 105) Method
hg
K
Tbp
Ttr
# SV
# BS
Acc
SMO
–
–
–
9,143
LIBSVM
–
–
–
7,112
931
–
90.3
760
–
CSVM
–
–
–
89.8
4,467
450
–
87.5
RCHSVM
–
–
CCHSVM
5
8
–
57,872
110
637
86.2
1,618
1,469
236
324
89.9
# BS
Acc
Table 5 Classification results of SVMGuide1 data set Method
l hg
lK
l Tbp
l Ttr
# SV
SMO
–
LIBSVM
–
–
–
3,243
648
–
95.4
–
–
2,502
571
–
96.4 93.7
CSVM
–
–
–
731
350
–
RCHSVM
–
–
–
5,372
330
734
94.5
CCHSVM
4
7
437
548
253
894
95.0
The hyperspheres can overlap in no more than 10 %. Figure 12 shows an example of three-dimensional Spheres n data set. The size of the data set also vary from 10 K up to 250 K. We repeat Experiment 1 with Spheres n data set. Now, the data set becomes high dimensional. The classification results are shown in Table 3. Here SMO(2,50 9 103) means training SMO with twodimensional balls and 50,000 samples, Bð4200 103 Þ means training CCHSVM with four-dimensional balls and 200,000 samples. The comparison results with SMO, LibSVM, CSVM, and RCH-SVM with 40,000 four-dimensional spheres data set are shown in Table 4. Similar results as Experiment 1 are obtained. However, for higher dimension classification the training speed of our CCH-SVM does not improve a lot. 3.3 Experiment 3 In this experiment, we use another benchmark data set, the svmguide1 with 3,089 training samples and 4,000 testing samples. The comparison results with SMO, LibSVM, CSVM, and RCH-SVM are shown in Table 5. It can be seen that even data set size is less than 10 K, our algorithm is also competitive in this scenario.
4 Conclusions This paper proposes a novel data reduction method for SVM classification, CCH-SVM. It overcomes the slow training problem of SVM and the low classification
accuracy problem of geometric methods using a grid pre-process method and convex–concave hull. CCH-SVM computes convex hull in two dimensions and uses it to detect border points. A pre-processing method is used to detect the border points of the separated partitions. For multi-dimensional case, we apply an online clustering algorithm to reduce the dimensions. The detected border points are projected back to their original dimensions. Experimental results demonstrate that our approach has good classification accuracy while the training time is significantly faster than other SVM training methods. The proposed method is efficient for dimensions less than four, although it can be applied to higher dimensional data sets.
References Bennett KP, Bredensteiner EJ (2000a) Geometry in learning. In: Gorini C (eds), Geometry at work. Mathematical Association of America, pp 132–145 Bennett K.P., Bredensteiner E.J. (2000b) Duality and geometry in SVM classifiers. 17th International Conference on Machine Learning, San Francisco Berg M, Cheong O, Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications. Springer, Berlin Cervantes J, Li X, Yu W, Li K (2008) Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 71:611–619 Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/*cjlin/libsvm Collobert R, Bengio S (2001) SVMTorch: support vector machines for large regression problems. J Mach Learn Res 1:143–160 Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. 23rd international conference on machine learning, Pittsburgh, pp 201–208 Crisp DJ, Burges CJC (2000) A geometric interpretation of t-SVM classifiers. NIPS 12:244–250 Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge Eddy W (1977) A new convex hull algorithm for planar sets. ACM Trans Math Softw 3(4):398–403 Franc V, Hlavac V (2003) An iterative algorithm learning the maximal margin classifier. Pattern Recogn Lett 36:1985–1996 Gilbert EG (1966) An iterative procedure for computing the minimum of a quadratic form on a convex set. SIAM J Control Optim 4(1):61–79 Graham RL (1972) An efficient algorithm for dutennining the convex hull of a finite pianar set. Inf Process Lett 1:132–133 Guo G, Zhang J-S (2007) Reducing examples to accelerate support vector regression. Pattern Recognit Lett 28:2173–2183 Ho TK, Kleinberg EM (1996) Checkerboard dataset. http://www. cs.wisc.edu/ Hsu C-W, Chang C-C, Lin C-J (2010) A practical guide to support vector classification. Bioinform Biol Insights 1(1):1–16 Jarvis RA (1973) On the identification of the convex hull of a finite set of points in the plane. Inf Process Lett 2:18–21 Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin Kallay M (1984) The complexity of incremental convex hull algorithms. Inf Process Lett 19(4):197–212
123
A. L. Chau et al. Keerthi SS, Gilbert EG (2002) Convergence of a generalized SMO algorithm for SVM classifier design. Mach Learn 46:351–360 Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Trans Neural Netw 11(12):124–137 Li Y (2011) Selecting training points for one-class support vector machines. Pattern Recognit Lett 32:1517–1522 Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 17(3):671–682 Mitchell BF, Dem’yanov VF, Malozemov VN (1971) Finding the point of a polyhedron closest to the origin. Vestinik Leningrad Gos Univ 13:38–45 Moreira A, Santos MY (2007) Concave hull: a K-nearest neighbours approach for the computation of the region occupied by a set of points. GRAPP (GM/R), pp 61–68 Pizzuti C, Talia D (2003) P-auto class: scalable parallel clustering for mining large data sets. IEEE Trans Knowl Data Eng 15(3): 629–641 Platt J. (1998) Fast training of support vector machine using sequential minimal optimization, advances in kernel methods: support vector machine. MIT Press, Cambridge
123
Preparata FP, Hong SJ (1977) convex hulls of finite sets of points in two and three dimensions. Commun ACM 20(2):87–93 Schlesinger MI, Kalmykov VG, Suchorukov AA, Sravnitelnyj (1981) Comparative analysis of algorithms synthesising linear decision rule for analysis of complex hypotheses. Automatika 1:3–9 Tsang IW, Kwok JT, Cheung PM (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392 Vapnik V (1995) The nature of statistical learning theory. Springer, New York Xia C, Hsu W, Lee ML, Ooi BC (2006) BORDER: efficient computation of boundary points. IEEE Trans Knowl Data Eng 18(3):289–303 Yu W, Li X (2008) On-line fuzzy modeling via clustering and support vector machines. Inf Sci 178:4264–4279 Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. Proceedings of the 9th ACM SIGKDD 2003 Washington, DC Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput Appl 15(4):915–936