Detecting RNA Sequences Using Two-Stage SVM Classifier Xiaoou Li1 and Kang Li2 1
2
Departamento de Computaci´ on CINVESTAV-IPN A.P. 14-740, Av.IPN 2508, M´exico D.F., 07360, M´exico School of Electronics, Electrical Engineering and Computer Science Queen’s University Belfast Ashby Building, Stranmillis Road, Belfast, BT9 5AH, UK
[email protected] Abstract. RNA sequences detection is time-consuming because of its huge data set size. Although SVM has been proved to be useful, normal SVM is not suitable for classification of large data sets because of its high training complexity. A two-stage SVM classification approach is introduced for fast classifying large data sets. Experimental results on several RNA sequences detection demonstrate that the proposed approach is promising for such applications.
1
Introduction
RNA plays many important biological roles other than as a transient carrier of amino acid sequence information [14]. It catalyzes peptide bond formation, participates in protein localization, serves in immunity, catalyzes intron splicing and RNA degradation, serves in dosage compensation. It is also an essential subunit in telomeres, guides RNA modification, controls development, and has an abundance of other regulatory functions [29]. Non-coding RNAs (ncRNAs) are transcripts that have function without being translated to protein [12]. The number of known ncRNAs is growing quickly, and their significance had been severely underestimated in classic models of cellular processes. It is desirable to develop high-throughput methods for discovery of novel ncRNAs for greater biological understanding and for discovering candidate drug targets. However, novel ncRNAs are difficult to detect in conventional biochemical screens. They are frequently short, often not polyadenylic, and might only be expressed under specific cellular conditions. Experimental screens have found many ncRNAs, but have demonstrated that no single screen is capable of discovering all known ncRNAs for an organism. A more effective approach, demonstrated in previous studies [2,28], may be to first detect ncRNA candidates computationally, then verify them biochemically. Considering the number of available whole genome sequences, SVM can be applied to a large and diverse data set, and has massive potential for novel ncRNA discovery[21,27]. However, long training K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 8–20, 2007. c Springer-Verlag Berlin Heidelberg 2007
Detecting RNA Sequences Using Two-Stage SVM Classifier
9
time is needed. Therefore, it is impossible to repeat the SVM classification on the updated data set in an acceptable time when new data are included into the data set frequently or continuously. Many researchers have tried to find possible methods to apply SVM classification for large data sets. Generally, these methods can be divided into two types: 1) modify SVM algorithm so that it could be applied to large data sets, and 2) select representative training data from a large data set so that a conventional SVM could handle. For the first type, a standard projected conjugate gradient (PCG) chunking algorithm can scale somewhere between linear and cubic in the training set size [8,16]. Sequential Minimal Optimization (SMO) is a fast method to train SVM [23,7]. Training SVM requires the solution of QP optimization problem. SMO breaks this large QP problem into a series of smallest possible QP problems, it is faster than PCG chunking. [10] introduced a parallel optimization step where block diagonal matrices are used to approximate the original kernel matrix so that SVM classification can be split into hundreds of subproblems. A recursive and computational superior mechanism referred as adaptive recursive partitioning was proposed in [17], where the data is recursively subdivided into smaller subsets. Genetic programming is able to deal with large data sets that do not fit in main memory [11]. Neural networks technique can also be applied for SVM to simplify the training process [15]. For the second type, clustering has been proved to be an effective method to collaborate with SVM on classifying large data sets. For examples, hierarchical clustering [31,1], k-means cluster [4] and parallel clustering [7]. Clustering based methods can reduce the computations burden of SVM, however, the clustering algorithms themselves are still complicated for large data set. Rocchio bundling is a statistics-based data reduction method [25]. The Bayesian committee machine is also reported to be used to train SVM on large data sets, where the large data set is divided into m subsets of the same size, and m models are derived from the individual sets [26]. But, it has higher error rate than normal SVM and the sparse property does not hold. Falling into the second type of SVM classification methods for large data sets, a two stages SVM classification approach has been proposed in our previous work[4,5,18]. At first, we select representative training data from the original data set using the results of clustering, and these selected data are used to train the first stage SVM. Note that the first stage SVM is not precise enough because of the great reduction on original data set. So we use a second stage SVM to refine the classification. The obtained support vectors of the first stage SVM are used to select data for the second stage SVM by recovering their clustermates (we call the process de-clustering). At last, the second stage SVM is applied on those de-clustered data. Our experimental results show that the accuracy obtained by our approach is very close to the classic SVM methods, while the training time is significantly shorter. Furthermore, the proposed approach can be applied on huge data sets regardless of their dimensionality. In this paper, we apply our approach on several RNA sequences data sets. The rest of the paper is organized as follows: Section II introduces our two
10
X. Li and K. Li
stages SVM classifier. Section III show the experiment results on RNA sequence detection with comparisons with other well known classifiers. Conclusion is given in Section IV.
2
Two-Stage SVM Classifier
By the sparse property of SVM, data samples which are not support vectors will not contribute the optimal hyperplane. The input data sets which are far away from the decision hyperplane should be eliminated, meanwhile the data sets which are possibly support vectors should be used. In this paper, we select the cluster centers and data of mix-labeled clusters as training data for the first stage SVM. We believe these data are the most useful and representative in a large data set for finding support vectors. Nota that, the training data set in the first stage SVM classification is only a small percentage of the original data. Data of the clusters near the hyperplane are not used totally for training SVM, since we only select the cluster centers. This may affect the classification precision, i.e., the obtained decision hyperplane may not be precise enough. However, at least it gives us a reference on data distribution. According to above analysis, we make the following modification on the training data set of the first stage SVM. 1). Remove the data far from the hyperplane from the training data set because they will not contribute to find the support vectors, 2). Retain the data of the mix-labeled clusters since they are more likely support vectors. 3). Additionally, we add the data of the clusters whose centers are support vectors of the first stage SVM. In general, our approach consists of the four steps which are shown in Figure 1: 1) data selection, 2) the first stage SVM classification, 3) de-clustering, 4) the second stage SVM classification. The following subsections will give a detailed explanation on each step. 2.1
Selecting Training Data
The goal of clustering is to separate a finite number of unlabeled items into a finite and discrete set of “natural” hidden data structures, such that items in the same cluster are more similar to each other, and those in different clusters tend to be dissimilar according to certain measure of similarity or proximity. A large number of clustering methods have been developed, e.g., squared error-based kmeans [3], fuzzy C-means [22], kernel-base clustering [13]. By our experience, fuzzy C-means clustering, Minimum enclosing ball(MEB) clustering and random selection have been proved very effective for selecting training data for the first stage SVM.[4,5,18] Let l be the cluster number, then the process of clustering is to find l partitions (or clusters) Ωi from input data set X, i = 1, . . . , l, l < n, Ωi = ∅, ∪li=1 Ωi = X.
Detecting RNA Sequences Using Two-Stage SVM Classifier
11
Fig. 1. Two-stage SVM classification
Note that the data in a cluster may have same label (positive or negative) or different labels (both positive and negative). The obtained clusters can be classified into three types: 1) clusters with only positive labeled data, denoted by Ω + , i.e., Ω + = {∪Ωi | y = +1}; 2) clusters with only negative labeled data, denoted by Ω − , i.e., Ω − = {∪Ωi | y = −1}; 3) clusters with both positive and negative labeled data (or mix-labeled), denoted by Ωm , i.e., Ωm = {∪Ωi | y = ±1}. Figure 2 (a) illustrates the clusters after clustering, where the clusters with only red points are positive labeled (Ω + ), the clusters with green points are negative labeled (Ω − ) , and clusters A and B are mix-labeled (Ωm ).
Fig. 2. Data selection: (a) Clusters (b) The first stage SVM
12
X. Li and K. Li
We select not only the centers of the clusters but also all the data of mixlabeled clusters as training data in the first SVM classification stage. If we denote the set of the centers of the clusters in Ω + and Ω − by C + and C − respectively, i.e., C + = {∪Ci | y = +1} positive labeled centers C − = {∪Ci | y = −1} negative labeled centers Then the selected data which will be used in the first stage SVM classification is the union of C + , C − and Ωm , i.e., C + ∪ C − ∪ Ωm . In Figure 2 (b), the red points belongs to C + , and the green points belong to C − . It is clear that the data in Figure 2 (b) are all cluster centers except the data in mix-labeled clusters A and B. 2.2
The First Stage SVM Classification
We consider binary classification. Let (X, Y ) be the training patterns set, X = {x1 , · · · , xn }, Y = {y1 , · · · , yn } yi = ±1, xi = (xi1, . . . , xip )T ∈ Rp
(1)
The training task of SVM classification is to find the optimal hyperplane from the input X and the output Y , which maximize the margin between the classes. That is, training SVM yields to find an optimal hyperplane or to solve the following quadratic programming problem (primal problem), 1 T 2w w
n
+ c ξk minw,b J (w) = k=1 T subject : yk w ϕ (xk ) + b ≥ 1 − ξk
(2)
where ξk is slack variables to tolerate mis-classifications ξk > 0, k = 1 · · · n, c > 0, wk is the distance from xk to the hyperplane wT ϕ (xk ) + b = 0, ϕ (xk ) is a nonlinear function. The kernel which satisfies the Mercer condition [9] is T K (xk , xi ) = ϕ (xk ) ϕ (xi ) . (2) is equivalent to the following quadratic programming problem which is a dual problem with the Lagrangian multipliers αk ≥ 0, n n maxα J (α) = − 21 yk yj K (xk , xj ) αk αj + αk subject :
n
k,j=1
αk yk = 0,
k=1
(3)
0 ≤ αk ≤ c
k=1
Many solutions of (3) are zero, i.e., αk = 0, so the solution vector is sparse, the sum is taken only over the non-zero αk . The xi which corresponds to nonzero αi is called a support vector (SV). Let V be the index set of SV, then the optimal hyperplane is αk yk K (xk , xj ) + b = 0 (4) k∈V
Detecting RNA Sequences Using Two-Stage SVM Classifier
The resulting classifier is
y(x) = sign
13
αk yk K (xk , x) + b
k∈V
where b is determined by Kuhn-Tucker conditions. Sequential minimal optimization (SMO) breaks the large QP problem into a series of smallest possible QP problems [23]. These small QP problems can be solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop. The memory required by SMO is linear in the training set size, which allows SMO to handle very large training sets [16]. A requirement l αi yi = 0, it is enforced throughout the iterations and implies that in (3) is i=1
the smallest number of multipliers can be optimized at each step is two. At each step SMO chooses two elements αi and αj to jointly optimize, it finds the optimal values for these two parameters while all others are fixed. The choice of the two points is determined by a heuristic algorithm, the optimization of the two multipliers is performed analytically. Experimentally the performance of SMO is very good, despite needing more iterations to converge. Each iteration uses few operations such that the algorithm exhibits an overall speedup. Besides convergence time, SMO has other important features, such as, it does not need to store the kernel matrix in memory, and it is fairly easy to implement [23]. In the first stage SVM classification, we use SVM classification with SMO algorithm to get the decision hyperplane. Here, the training data set is C + ∪ C − ∪ Ωm , which has been obtained in the last subsection. Figure 2 (b) shows the results of the first stage SVM classification. 2.3
De-clustering
We propose to recover the data into the training data set by including the data in the clusters whose centers are support vectors of the first stage SVM, we call this process de-clustering. Then, more original data near the hyperplane can be found through the de-clustering. The de-clustering results of the support vectors in Figure 2 (b) are shown in Figure 3 (a). The de-clustering process not only overcomes the drawback that only small part of the original data near the support vectors are trained, but also enlarge the training data set size of the second stage SVM which is good for improving the accuracy. 2.4
The Second Stage SVM Classification
Taking the recovered data as new training data set, we use again SVM classification with SMO algorithm to get the final decision hyperplane yk α∗2,k K (xk , x) + b∗2 = 0 (5) k∈V2
where V2 is the index set of the support vectors in the second stage. Generally, the hyperplane (4) is close to the hyperplane (5).
14
X. Li and K. Li
Fig. 3. (a) De-clustering (b) The second stage SVM
In the second stage SVM, we use the following two types of data as training data: 1). The data of the clusters whose centers are support vectors, i.e., ∪Ci ∈V {Ωi }, where V is a support vectors set of the first stage SVM; 2). The data of mix-labeled clusters, i.e, Ωm . Therefore, the training data set is ∪Ci ∈V {Ωi } ∪ Ωm . Figure 3 (b) illustrates the second stage SVM classification results. One can observe that the two hyperplanes in Figure 2 (b) and Figure 3 (b) are different but similar.
3
RNA Sequence Detection
We use three case studies to show the two-stage SVM classification approach introduced in the last section. The first example is done to show the necessities of the second stage SVM by comparing the accuracy of both stages. The second example is not a large data set, but it shows that training time, accuracy can be improved through adjusting the cluster number. The third example is a real large data set, we made a complete comparison with several well known algorithms as well as our two stage SVM with different clustering methods. Example 1. The training data is at www.ghastlyfop.com/blog/tag index svm.html/ .To train the SVM classifier, a training set containing every possible sequence pairing. This resulted in 47, 5865 rRNA and 114, 481 tRNA sequence pairs. The input data were computed for every sequence pair in the resulting training set of 486, 201 data points. Each record has 8 attributes with continuous values between 0 to 1. In [27], a SVM-based method was proposed to predict the common structure of two RNA sequences on the basis of minimizing folding free energy change. RNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine.
Detecting RNA Sequences Using Two-Stage SVM Classifier
15
Fig. 4. The first stage SVM classification on the RNA sequences data set used in [27] with 103 data
In our experiments, we obtained 12 clusters from 1000 original data using FCM clustering. Then, in the first stage SVM, 113 training data including the cluster centers and data of mix-labeled clusters are obtained using the data selection process introduced in section II, and we got 23 support vectors. Figure 4 shows the result of the first stage SVM. Following the de-clustering technique, 210 data were recovered as training data for the second stage SVM. In the second stage SVM, we got 61 support vectors, see Figure 5. Table 1 shows the comparisons on training time and accuracy between our two SVM stages. The training time of our two-stage SVM and LIBSVM is first compared. For training 103 data, our classifier needs 67 seconds while LIBSVM needs about 100 seconds. For training 104 data, our classifier needs 76 seconds while LIBSVM needs about 1, 000 seconds. And, for 486, 201 data, our classifier needs only 279 seconds while the LIBSVM should use a very long time, it is not reported in [27], (we guess it maybe around 105 seconds). On the other hand, there is almost no difference between their accuracies. This implies that our approach has great advantage on gaining training time. Then, the accuracy between the first stage SVM and two-stage SVM is compared. From the figures and Table 1, it is obvious that the accuracy of two-stage SVM is much better than the first stage SVM. This shows that the two stages are necessary. Example 2. 3mer Dataset. The original work on string kernels – kernel functions defined on the set of sequences from an alphabet S rather than on a vector space [9] – came from the field of computational biology and was motivated by algorithms for aligning DNA and protein sequences. The recently presented k-spectrum (gap-free k-gram) kernel and the (k,m) mismatch kernel provide an alternative model for string kernels for biological sequences, and were designed, in particular, for the application of SVM
16
X. Li and K. Li
Fig. 5. The two stages SVM classification on the RNA sequences data set used in [27] with 103 data
protein classification. These kernels use counts of common occurrences of short k-length subsequences, called k-mers, rather than notions of pairwise sequence alignment, as the basis for sequence comparison. The k-mer idea still captures a biologically-motivated model of sequence similarity, in that sequences that diverge through evolution are still likely to contain short subsequences that match or almost match. We use SVM to classify proteins based on sequence data into homologous groups (evolutionary similar) to understand the structure and functions of proteins. The 3mer data set has 2000 data points, and each record has 84 attributes with continuous values between 0 to 1. The data set contains 1000 positive sequences and 1000 negative sequences. The data set is available at noble.gs.washington.edu/proj/hs/ In [21], spectrum kernel was used as a feature set representing the distribution of every possible k-mer in a RNA sequence. The value for each feature is the number of times that particular feature appears in the sequence divided by the number of times any feature of the same length appears in the sequence. In our experiments, we used MEB clustering to select data for the first stage SVM, and k=3 (i.e., 3mers, 2mers and 1mers are our features) to train our two stages SVM classifier. Table 2 shows the accuracy and training time of our classifier and LIBSVM, where the accuracies are almost the same. Also there is no much deference on training time, this is because that the data set contains only 2,000 data. However, we did experiments with cluster number (l) 400 and 100. We can see that, when we use less cluster number, the training time is less too, since the training data size is smaller, but, we get a worse accuracy. Example 3. This RNA data set is available at http://www.pubmedcentral. nih.gov /articlerender.fcgi?artid=1570369#top from Supplementary Material (additional file 7). The data set consists of 23605 data points, each record has 8 attributes with
Detecting RNA Sequences Using Two-Stage SVM Classifier
17
Table 1. Accuracy and training time on the RNA sequences data set in [27]
Data set size 103 104 486, 201
First stage SVM T (s) Acc (%) 31 67.2 70 76.9 124 81.12
Two-stage SVM T (s) Acc (%) 76 88.9 159 92.7 279 98.4
LIBSVM T (s) Acc (%) 102 87.4 103 92.6 98.3 105 ?
Table 2. Accuracy and training time on the RNA sequences data set in [21] Two-stage SVM # t(s) Acc(%) l 2000 17.18 75.9 400 2000 7.81 71.7 100
LIBSVM # t(s) 2000 8.71 — —
Acc(%) 73.15 —
continuous values between 0 to 1. The data set contains 3919 ncRNAs and 19686 negative sequences. We used sizes 500, 1, 000, 2, 500, 5, 000, 10, 000 and 23, 605 in our experiments. Experiments were done using MEB two-stage, RS two-stage, SMO, LIBSVM and simple SVM. Table 3 shows our experiment results on different data size with MEB two-stage and RS two-stage. Table 4 shows the comparisons between our approach and other algorithms. In Table 3 and 4, the notations are as explained as follows. “#” is the data size; “t” is the training time of the whole classification which includes the time of clustering, the first stage SVM training, de-clustering and the second stage SVM training; “Acc” is the accuracy; “l” is the number of clusters used in the experiment; “TrD2” is the number of training data for the second stage SVM training; “SV1” is the number of support vectors obtained in the first stage SVM; “SV2” is the number of support vectors obtained in the second stage SVM. Table 3 shows our experiment results on different data size with MEB twostage and RS two-stage. For example, in the experiment on 10, 000 data points, we sectioned it into 650 clusters using MEB clustering and random selection. In the first stage classification of MEB two-stage, we got 199 support vectors. Following the de-clustering technique, 862 data were recovered as training data for the second stage SVM, which is much less than the original data size 10, 000. In the second stage SVM, 282 support vectors were obtained. From Table 2, we can also see that MEB two-stage has a little better accuracy than random selection (RS) two-stage, while its training time is longer than that of RS two-stage. Table 4 shows the comparison results on training time and accuracy between our two-stage classification and some other SVM algorithms including SMO, simple SVM and LIBSVM. For example, to classify 5000 data, LIBSVM is the fastest, and SMO has the best accuracy, our two approaches are not better than them,
18
X. Li and K. Li
although the time and accuracy are similar to them. However, to classify 23, 605 data, Simple SVM and SMO have no better accuracy than the others, but their training time is tremendous longer. Comparing to our two approaches, LIBSVM takes almost double training time of MEB two-stage, and almost 7 times of the time of RS two-stage, although it has the same accuracy as ours. This experiment implies that our approach has great advantage on large data sets since it can reach the same accuracy as the other algorithm can in a very short training time. Table 3. Two-stage SVM classification results on RNA sequence data set MEB two-stage # t Acc 500 4.71 85.3 1000 5.90 86.2 2500 15.56 86.3 5000 26.56 86.7 10000 69.26 86.9 23605 174.5 88.5
l 350 400 450 500 650 1500
SV1 87 108 124 149 199 278
TrD2 397 463 529 656 862 1307
RS two-stage # t Acc 500 4.07 85.3 1000 4.37 85.7 2500 11.2 86.5 5000 15.8 86.1 10000 30.2 86.5 23605 65.7 88.3
SV2 168 162 209 227 282 416
l 350 400 450 500 650 1500
SV1 88 97 132 146 187 257
TrD2 421 453 581 637 875 1275
SV2 172 153 221 211 278 381
Table 4. Training time and accuracy on RNA sequence data set
# 500 1000 2500 5000 10000 23605
4
MEB two-stage RS two-stage t Acc t Acc 4.71 85.3 4.07 85.3 5.90 86.2 4.37 85.7 15.56 86.3 11.21 86.5 26.56 86.7 15.79 86.1 69.26 87.9 30.22 86.5 174.5 88.2 65.7 88.3
LIBSVM t Acc 0.37 86 0.72 87.2 3.06 87.4 12.53 87.6 48.38 88.2 298.3 88.6
SMO t 1.56 3.54 4.20 212.43 1122.5 —-
Acc 87.7 88.3 87.7 88.8 89.6 —-
Simple SVM t Acc 2.78 86.7 8.18 87.1 561.3 88.1 ——————-
Conclusions and Discussions
Our two-stage SVM classification approach is much faster than other SVM classifiers without loss of accuracy when data set is large enough. From the results of the experiments we made on biological data sets in this work, our approach has been showed suitable for classifying large and huge biological data sets. Additionally, another promising application on genomics machine learning is under study.
References 1. Awad, M.L., Khan, F., Bastani, I., Yen, L.: An Effective support vector machine(SVMs) Performance Using Hierarchical Clustering. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 663– 667. IEEE Computer Society Press, Los Alamitos (2004) 2. Axmann, I.M., Kensche, P., Vogel, J., Kohl, S., Herzel, H., Hess, W.R.: Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol R73 6 (2005)
Detecting RNA Sequences Using Two-Stage SVM Classifier
19
3. Babu, G., Murty, M.: A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognit. Lett. 14, 763–769 (1993) 4. Cervantes, J., Li, X., Yu, W.: Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 572–582. Springer, Heidelberg (2006) 5. Cervantes, J., Li, X., Yu, W., Li, K.: Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing (accepted for publication) 6. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), http://www.csie.ntu.edu.tw/∼ cjlin/libsvm 7. Chen, P.H., Fan, R.E., Lin, C.J.: A Study on SMO-Type Decomposition Methods for Support Vector Machines. IEEE Trans. Neural Networks 17, 893–908 (2006) 8. Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large regression problems. Journal of Machine Learning Research 1, 143–160 (2001) 9. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000) 10. Dong, J.X., Krzyzak, A., Suen, C.Y.: Fast SVM Training Algorithm with Decomposition on Very Large Data Sets. IEEE Trans. Pattern Analysis and Machine Intelligence 27, 603–618 (2005) 11. Folino, G., Pizzuti, C., Spezzano, G.: GP Ensembles for Large-Scale Data Classification. IEEE Trans. Evol. Comput. 10, 604–616 (2006) 12. Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: RFAM: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, 121–124 (2005) 13. Girolami, M.: Mercer kernel based clustering in feature space. IEEE Trans. Neural Networks 13, 780–784 (2002) 14. Hansen, J.L., Schmeing, T.M., Moore, P.B., Steitz, T.A.: Structural insights into peptide bond formation. Proc. Natl. Acad. Sci. 99, 11670–11675 (2002) 15. Huang, G.B., Mao, K.Z., Siew, C.K., Huang, D.S.: Fast Modular Network Implementation for Support Vector Machines. IEEE Trans. on Neural Networks (2006) 16. Joachims, T.: Making large-scale support vector machine learning practice. Advances in Kernel Methods: Support Vector Machine. MIT Press, Cambridge (1998) 17. Kim, S.W., Oommen, B.J.: Enhancing Prototype Reduction Schemes with Recursion: A Method Applicable for Large Data Sets. IEEE Trans. Syst. Man, Cybern. B. 34, 1184–1397 (2004) 18. Li, X., Cervantes, J., Yu, W.: Two Stages SVM Classification for Large Data Sets via Randomly Reducing and Recovering Training Data. In: IEEE International Conference on Systems, Man, and Cybernetics, Montreal Canada (2007) 19. Lin, C.T., Yeh, L.C.M., S, F., Chung, J.F., Kumar, N.: Support-Vector-Based Fuzzy Neural Network for Pattern Classification. IEEE Trans. Fuzzy Syst. 14, 31–41 (2006) 20. Mavroforakis, M.E., Theodoridis, S.: A Geometric Approach to Support Vector Machine(SVM) Classification. IEEE Trans. Neural Networks 17, 671–682 (2006) 21. Noble, W.S., Kuehn, S., Thurman, R., Yu, M., Stamatoyannopoulos, J.: Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21, 338– 343 (2005) 22. Pal, N., Bezdek, J.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379 (1995)
20
X. Li and K. Li
23. Platt, J.: Fast Training of support vector machine using sequential minimal optimization. Advances in Kernel Methods: support vector machine. MIT Press, Cambridge, MA (1998) 24. Prokhorov, D.: IJCNN 2001 neural network competition. Ford Research Laboratory (2001), http://www.geocities.com/ijcnn/nnc ijcnn01.pdf 25. Shih, L., Rennie, D.M., Chang, Y., Karger, D.R.: Text Bundling: Statistics-based Data Reduction. In: Proc. of the Twentieth Int. Conf. on Machine Learning, Washington DC (2003) 26. Tresp, V.: A Bayesian Committee Machine. Neural Computation 12, 2719–2741 (2000) 27. Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 173 (2006) 28. Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 23, 1383–1390 (2005) 29. Weilbacher, T., Suzuki, K., Dubey, A.K., Wang, X., Gudapaty, S., Morozov, I., Baker, C.S., Georgellis, D., Babitzke, P., Romeo, T.: A novel sRNA component of the carbon storage regulatory system of Escherichia coli. Mol. Microbiol. 48, 657– 670 (2003) 30. Xu, R., WunschII, D.: Survey of Clustering Algorithms. IEEE Trans. Neural Networks 16, 645–678 (2005) 31. Yu, H., Yang, J., Han, J.: Classifying Large Data Sets Using SVMs with Hierarchical Clusters. In: Proc. of the 9th ACM SIGKDD (2003)