Semi-Supervised Learning Methods for Network Intrusion Detection Chuanliang Chen1, Yunchao Gong2, and Yingjie Tian3 1
Department of Computer Science, Beijing Normal University, Beijing 100875, China Email:
[email protected] 2 Software Institute, Nanjing University, Nanjing 210089, China Email:
[email protected] 3 Research Centre on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100080, China Corresponding Author, Email:
[email protected] Abstract—Recently increasing interests of applying or developing specialized machine learning techniques have attracted many researchers in the intrusion detection community. Existing research work show: the supervised algorithms deteriorates signifycantly if unknown attacks are present in the test data; the unsupervised algorithms exhibit no significant difference in performance between known and unknown attacks but their performances are not that satisfying. In this contribution, we propose two semi-supervised classification methods, Spectral Graph Transducer and Gaussian Fields Approach, to detect unknown attacks and one semi-supervised clustering method— MPCK-means to improve the performances of the traditional purely unsupervised clustering methods. Our empirical study shows that performances of semi-supervised classification methods are much better than those of supervised classifiers, and semisupervised clustering method can improve purely unsupervised clustering methods markedly. Keywords—Semi-Supervised Learning, Transductive Learning, Intrusion Dection, Data Mining
I.
INTRODUCTION
With the rapid increase in connectivity and accessibility of computer systems over the Internet, computer security has become a critical issue. According to a recent research survey by CERT/CC [2], cyber attacks have rapidly increased over the past decade. Computer systems’ highly connecting is the direct reason of these frequent opportunities for intrusions and attacks. As the cost of the information processing and Internet accessibility falls, more and more organizations are becoming vulnerable to a wide variety of cyber threats [1]. Therefore, Intrusion Detection Systems (IDS) is becoming very important for computer security [3]. Intrusion Detection can be classified into two main categories: misuse detection and anomaly detection [5]. Misuse detection methods are intended to recognize known attack patterns. Anomaly detection focuses on finding unusual activity patterns in the observed data [6,11,12]. Though signature-based misuse detection techniques are currently most widely used in practice [4], recently in the intrusion detection community, applying advances machine learning techniques to perform detection has attracted many researchers [14,15,16]. Although signature-based misuse IDS have been proven to be quite effective, they are inherently unable to detect unknown attacks or even new variants of
1-4244-2384-2/08/$20.00 ©2008 IEEE
known attacks [17]. This problem also exists in IDS based on supervised learning method. In [4], P. Laskov et al. perform an experimental framework for comparative analysis of supervised and unsupervised learning methods for intrusion detection. Their findings suggest that the problem of test data being drawn from a different distribution cannot be solved within the purely supervised or unsupervised techniques [4] and they recommend that semi-supervised learning methods may provide promising intrusion detection ability, which is the most important motivation of our research work. There has been some interesting work of applying semisupervised learning methods to perform intrusion detection or combining unlabeled data for intrusion detection [18,19]. Semisupervised learning methods attempt to enhance the accuracy of classifiers by using labeled data, together with the unlabeled data. Many semi-supervised learning approaches have been proposed that use unlabeled data points into the learning paradigm. [20] is a survey on the semi-supervised learning area. There are mainly four categories of semi-supervised learning algorithms: generative model, self-training, co-training, and graph-based learning methods. In this paper, we propose two semi-supervised classification methods, Spectral Graph Transducer [21] and Gaussian Fields Approach [22], and one semi-supervised clustering method— MPCK-means [23] to perform intrusion detection. Our empirical study shows that semi-supervised learning methods perform much better than purely supervised or unsupervised learning methods on detecting unknown attack types. The remainder of this paper is organized as follows. Section 2 presents the two semi-supervised classification methods, Spectral Graph Transducer and Gaussian Fields Approach. We will briefly review five traditional supervised learning methods for intrusion detection in section 3. MPCK-means will be demonstrated in section 4. In section 5, the results of our experiments and their analysis will be presented. Finally, we will conclude our study and propose some interesting future work in section 6. II.
SEMI-SUPERVISED CLASSIFICATION METHOD
Two graph-based semi-supervised learning methods for classification are proposed to perform intrusion detection in
SMC 2008
this paper. They are Spectral Graph Transducer proposed in [21] and Gaussian Fields Approach proposed in [22]. From a different view, these two methods can be thought to be transductive learning methods. In fact early graph-based semisupervised learning methods are often transductive [20]. However, please note that semi-supervised learning can be either transductive or inductive [20]. In the rest of this section, we will briefly review these two methods, and more details can be found in [21,22]. A. Graph-based Learning Methods In the graph-based semi-supervised learning methods, a graph whose nodes are labeled and unlabeled instances xi (i=1,…,n) of the corpus is defined, and edges (be weighted) of the graph reflect the similarity between instances. The connection strength from ith node to jth node is encoded in element wij of a n×n weight symmetric matrix W. The weight of the edges is usually calculated by the Gaussian function of Euclidean distance: ⎧ ⎛ xi − x j 2 ⎞ ⎟ i~ j ⎪⎪ exp ⎜ − ⎜ ⎟ wi j = ⎨ σ2 ⎝ ⎠ ⎪ 0 otherwise ⎪⎩
where i ~ j represents ith node and jth node has an edge between them which can be established either by k nearest neighbors or by ε-nearest neighbors which means Euclidean distance between ith node and jth node must be within a certain radius ε, ||xi − xj||2 < ε. The graph-based semi-supervised learning methods can be viewed as estimating a function f on the graph [20]. Function f is desirable to satisfy two requirements at the same time: 1) it should be close to the given labels yL on the labeled nodes, and 2) it should be smooth on the whole graph. All of these can be expressed in a regularization framework where the first term is a loss function, and the second term is a regularizer [20]. In fact, many graph-based learning semi-supervised learning methods only differ in the particular choice of the loss function and the regularizer. B. Two Methods Used in This Paper Spectral Graph Transducer: Based on the framework stated in A, Spectral Graph Transducer can be viewed with a loss function and regularizer [21,20]: min f T Lf + c( f − γ )T C ( f − γ ) s.t.
f Τ1=0
and
f T f = n,
where γ i = l− / l+ for positive labeled data, − l− / l+ for negative labeled data, l+ represents the amount of positive labeled data and l− the like. L can be the combinatorial or normalized graph Laplacian, with a transformed spectrum. C is a diagonal matrix for misclassification costs, and c is a parameter that trades off training error versus cut-value. An implementation is available at http://sgt.joachims.org. Gaussian Fields Approach: In this approach, the learning problem is formulated in terms of a Gaussian random field on
the graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation [22]. It can be viewed as having a quadratic loss function with infinity weight, so that the labeled data are clamped (fixed at given label values), and a regularizer based on the graph combinatorial Laplacian Δ : ∞∑ ( f i − yi ) + 2
i∈L
1 ∑ w ij ( f i − f j )2 2 i, j
= ∞∑ ( f i − yi ) + f T Δf , 2
i∈L
where fi ∈ R, which is the key relaxation to min-cut and the matrix Δ called the graph Laplacian is defined as: Δ = D − W , where D = diag ( d i ), d i = ∑ j Wij . This allows for a simple closed-form solution for the node marginal probabilities. There have been many graph-based semi-supervised learning methods proposed in recent years, such as [28,29,30]. In this paper, we only explore the ability for intrusion detection of the two among them. III.
OTHER SUPERVISED CLASSIFICATION METHODS FOR COMPARISON
In this paper, we perform other seven traditional supervised learning methods for intrusion detection. These supervised classifiers are Naïve Bayes, Bayes Network, Support Vector Machine (SVM), Random Forest, k Nearest Neighbor (kNN), C4.5, and RBF Network. In this section, we present a brief description of them. Naïve Bayes: For a given instance, Naïve Bayes classifier finds a class ci which maximizes the posterior probability P(ci|x;θ’), by applying Bayes rule. Then x can be classified by calculating Eq. 1.
cl = arg max P(ci | θ ') P( x | ci ;θ ') . ci ∈C
(1)
Bayes Network: Bayes Network (also called as Bayesian Network, Belief Network) estimates the probability density function governing a set of random variables by specifying a set of conditional independence statements together with a set of conditional probability functions. Different from Naïve Bayes, Bayes Network is able to capture the conditional dependencies among features. SVM: Support Vector Machine (SVM) is an excellent classifier based on well developed statistical learning theory [27]. SVM constructs a hyperplane of the minimal 2-norm which separates the two classes of training examples. SVM can also construct a hyperplane in a feature space by applying a non-linear mapping. Random Forest: Random Forest is an excellent algorithm proposed in [25]. Each tree for classification is built by using a bootstrap sample of the data, and at each split the candidate set of variables is a random subset of the variables. Random forest has excellent performance in pattern recognition tasks. k Nearest Neighbor: k Nearest Neighbor (kNN) is a inductive memory based classification algorithm. It finds k examples in training data that are closest to the test example
SMC 2008
and assigns the most frequent label among these examples to the new example. C4.5: C4.5 is the most popular decision tree learning algorithm which is proposed by Quinlan [24]. C4.5 performs inference of decision trees using a set of conditions over the seen features. Classification of new instances is carried out by applying the inferred rules. RBF Network: The Radial Basis Function (RBF) Network has excellent learning capacity [26]. In our experiments, the kmeans clustering algorithm is used to provide the basis functions needed by RBF Network. Then RBF Network learns a logistic regression on top of that. IV.
where wij and wij are costs which provide a way of specifying the relative importance of the labeled versus unlabeled data while allowing individual constraint weights; 1 represents the indicator function, 1[true]=1 and 0 otherwise; • represents L2norm; A is a symmetric positive-definite matrix which is defined as xi − x j
Constraints: A small amount of available labeled data is used to aid the semi-supervised clustering process. Constraints contain must-link and cannot-link constraints between pairs of instances [31]. Pairwise constraints represent the user’s view of similarity in the domain. Pairwise constraints can be used to guide a clustering algorithm towards a better grouping. Metric Learning: Pairwise constraints can also used to adapt the underlying distance metric. Since the original data representation may not specify a space where clusters are sufficiently separated, modifying the distance metric warps the space to minimize distances between same-cluster objects, while maximizing distances between different-cluster objects [23]. Clustering by using learned metrics can find clusters more closely to the notion of similarity embodied in the supervision. Integrating Constraints and Metric Learning: Both of the two main techniques can be combined in the following objective function that attempts to minimize cluster dispersion under the learned metrics while reducing constraint violations. Let X={x1,…, xN},xi∈ \ m be set of instances, {μ1,..., μK} the set of K cluster centroids, li∈{1,…,K} the cluster assignment of xi, M the set of must-link where (xi, xj)∈M means xi and xj should be in the same cluster, C the set of cannot-link where (xi, xj)∈C means xi and xj should be in different clusters, W={wij} and W ={ wij } penalty costs for violating the constraints in M and C respectively.
∑ +∑ +∑
min
xi ∈X
( xi − μli
2
Ali
− log(det(A li )))
( xi , x j )∈M
( wij f M ( xi , x j ) 1[li ≠ l j ])
( xi , x j )∈C
( wij f C ( xi , x j ) 1[li = l j ]) ,
= ( xi − μli )T A( xi − μli ) , and log(det( A li ))) is
used to normalize constant of li-th Gaussian with covariance matrix A −l 1 .Functions fM (xi,xj) and fC (xi,xj) are defined as i
follows: f M ( xi , x j ) =
SEMI-SUPERVISED CLUASTERING METHOD
There are two main approaches on utilizing labeled data of semi-supervised clustering methods [23]: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. Both of these techniques are integrated in a uniform, principled framework–MPCK-means. In this section, we briefly describe MPCK-means and more details can be found in [23].
A
1 xi − x j 2
f C ( xi , x j ) = xl'i − xl''i
2 Ali 2 Ali
+
1 xi − x j 2
− xi − x j
2 Ali
2 Al j
,
where ( xl'i , xl''i ) is the maximally separated pair of instances of the dataset in accordance to li-th metric. V.
RESULTS OF EXPERIMENTS AND ANALYSIS
A. Data Source and Preprocessing The KDD Cup 1999 data set [7] is a common benchmark for evaluating intrusion detection techniques. 94% of instances in this set (4,898,430 instances) are extracted from the DARPA 1998 IDS evaluation [8]. The rest 6% of this data set (311,029 instances) is from the extended DARPA 1999 IDS evaluation [9]. More details about this data set can be found in [10,11]. KDD Cup data set is too large for personal computer to run on. So, for our experiments, we take random sampling technique to produce benchmark with suitable size. We select 5% instances of original ten percentage version of KDD Cup data set according to class distribution, i.e. maintaining the original attack distribution while producing a random subsample. This subsample is used to test the aforementioned semi-supervised clustering method, which contains about 2,5000 instances. TABLE I.
SUMMARY OF DATA SET FOR CLASSIFICATION Training data set
Testing data set
Attack Types
smurf. neptune.
satan. back. warezmaster. warezclient. pod. portsweep. ipsweep. teardrop. nmap. imap. rootkit. land. guess_passwd. ftp_write. perl. loadmodule.
Num. of instances
2427
421
When it comes to how to produce suitable subsample to evaluate the capabilities of the two semi-supervised learning methods, it becomes a little complex. In order to find the performance of semi-supervised learning methods for detecting unknown attacks from a different distribution, we resample a special subsample of KDD Cup data set. After careful observation, we find that some instances of infrequent attack types are just the wrongly-prediction instances of most traditional supervised classifiers. Therefore, we extract these
SMC 2008
instances of infrequent attack types to compose testing data set, and the instances of normal or frequent attack types are used to train classifiers. Table 1 summarizes the details of this special benchmark corpus. Please note that, the data set used for classification is different from that for clustering. The data set used to test the capabilities of clustering methods is just constructed by random sampling technique, but that designed for classification is constructed manually. Meanwhile, the sizes of them are also different. These two data sets can be got from us by e-mail. B. Results and Analysis of Classification Experiments We present the performances of Spectral Graph Transducer, Gaussian Fields Approach, and supervised learning methods stated in section 3 by using accuracy evaluation metric. Due to different parameters, there are in fact eleven versions of purely supervised learning methods used for comparison. For SVM, we select four different kernel functions to perform intrusion detection, which are RBF kernel, linear kernel, and polynomial kernel with two different exponent values. For kNN, we set k to be two different values: one (Nearest Neighbor) and three (3 Nearest Neighbor). Fig. 1 shows the performances of these thirteen different classifiers run on data set described in Table 1. 55% 50% 45%
35%
Accuracy
Why the performances of the two semi-supervised learning methods much better than traditional supervised learning methods? The better performances of the two semi-supervised learning methods are owing to that they are transductive learning methods, the testing instances for which are already known when training the classifier. That is, in the transductive setting, the learner can observe the examples in the test set and potentially exploit structure in their distribution [21]. This setting somewhat alleviate the curse coming from different distributions of training and testing data sets. Are the performances of the two semi-supervised learning methods satisfying enough? Obviously, they are not excellent enough because the best accuracy is just 53.44% which is not that valuable in practice, though due to the ways of constructing testing data set their performances are discounted than those on a normal data set. All in all, in theory, we need to build stronger classifiers. Who is the next candidate? The contributions of our work have shown that the performances of semi-supervised learning methods (version of transductive learning methods) is much better than traditional supervised learning methods, but they seem not good enough. We vote for transfer learning methods to be the next candidate for detecting unknown attack types for intrusion detection domain, who may be more excellent. C. Performance Measure for Clustering We use four performance measures in our experiments to evaluate the capability of MPCK-means. They are Pairwise FMeasure, Pairwise Precision, Mutual Information, and Pairwise Recall. We present their definitions and brief descriptions below.
40%
30% 25% 20% 15% 10% 5% 0%
T SG
training and test data distributions be identical, but this assumption is obviously not hold in this data set. Therefore, these traditional classification methods perform worse.
GF
NB
BN SVM SVM SVM SVM LRP3 P2
RF
NN
N kN
N .5 C4 RBF
Figure 1. Accuracy of two semi-supervised learning methods and seven traditional supervised learning methods with different parameters (eleven versions are obtained). SGT: Spectral Graph Transducer, GF: Gaussian Fields Approach, NB: Naïve Bayes, BN: Bayes Network, R-SVM: SVM with RBF kernel, L-SVM: SVM with linear kernel, P2-SVM: SVM with 2 degree polynomial kernel, P3-SVM: SVM with 3 degree polynomial kernel, RF: Random Forest, NN: Nearest Neighbor (kNN with k = 1), kNN: k Nearest Neighbor with k = 3, RBFN: RBF Network.
From Fig. 1, we can find the performances of the two semisupervised classifiers are much better than those of supervised learning methods for comparison. The accuracies of Spectral Graph Transducer and Gaussian Fields Approach are 53.44% and 49.41% respectively. However, the best accuracy of supervised learning methods is 26.13% achieved by SVM with RBF kernel function. There are many issues thrown out by the results shown in Fig. 1 and we will discuss them respectively and carefully below. Why the performances of supervised learning methods are so poor? The reasons are the curse of the different distributions. Because traditional supervised learning methods assume the
Mutual Information: The mutual information is a measure of the additional information known about one set when given another [13], that is: MI ( A, B ) = H ( A) + H ( B) − H ( A, B ) ,
where H(A) is the entropy of A and can be calculated by using n
H ( A) = −∑ p ( xi ) log 2 ( p ( xi )) . i =1
Pairwise Precision, Pairwise Recall, and Pairwise FMeasure: Pairwise F-Measure is used to evaluate the clustering results based on the underlying classes, which relies on the traditional information retrieval measures, adapted for evaluating clustering by considering same-cluster pairs [23]. All of their definitions are: PPrecision =
PRecall =
Num.PairsCorrectlyPredictedInSameCluster Num.TotalPairsPredictedInSameCluster
Num.PairsCorrectlyPredictedInSameCluster Num.TotalPairsInSameCluster
PF − measure =
2 × PPrecision × PRecall . PPrecision + PRecall
SMC 2008
1.00
95%
0.95 90%
Pairs Precision
Pairs F-Measure
0.90
0.85
MPCK-Means-S
0.80
MPCK-Means-M
K-Means
0.75
85%
80%
75%
MPCK-Means-S
MPCK-Means-M
K-Means
0.70 70% 0.65
0.60
65% 0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950 1000
0
50
100
150
200
250
300
Number of Constraints
350
400
450
500
550
600
650
700
750
800
850
900
950 1000
Number of Constraints
0.80
1.00
0.75 0.70
0.95
0.65 0.90
0.55 0.50
Pairs Recall
Mutual Information
0.60
0.45 0.40 0.35 0.30
0.85
MPCK-Means-S
0.80
MPCK-Means-M
K-Means
0.75
0.25 0.20
MPCK-Means-S
0.15
MPCK-Means-M
0.70
K-Means
0.10
0.65
0.05 0.00
0.60 0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950 1000
Number of Constraints
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950 1000
Number of Constraints
Figure 2. Performances of MPCK-means and K-means evaluating by four performance measures, Pairwise F-Measure, Pairwise Precision, Mutual Information, and Pairwise Recall. MPCK-means-M represents MPCK-means with multiple metrics (M) parameterized by diagonal matrices are used, MPCK-means-S represents MPCK-means with a single metric (S) parameterized by a diagonal matrix is used.
D. Results and Analysis of Clustering Experiments In our clustering experiments, results of MPCK-means and K-means were averaged over 50 runs of 5 folds to compensate for their randomized initialization. We test the performances of two versions of MPCK-means: MPCK-means with multiple metrics parameterized by diagonal matrices (MPCK-means-M), ad MPCK-means with a single metric (S) parameterized by a diagonal matrix (MPCK-means-S). Fig. 2 shows the results of MPCK-means (both of the two versions) and K-means on the data set in the aforementioned sampling way. We find that the performance of MPCK-means is not satisfying or even worse than that of K-means evaluating by some performance measures when only using metric learning technique. However, when a slight amount of constraints is added into benchmark corpus, the performances of the two versions MPCK-means are improved very much. That is, a slight amount of available labeled data is enough to improve performance of MPCK-means to be much better than purely unsupervised learning methods or even some supervised learning methods. This means, we can gain high detection rate with a little amount of labeling work and also be able to escape from the curse from unknown attacks. VI.
CONCLUSIONS AND FEATURE WORK
In this paper, we investigate the capabilities of semisupervised learning methods, both semi-supervised classify-
cation methods and semi-supervised clustering methods, for intrusion detection. Our main contributions are: Firstly, two semi-supervised classification methods, Spectral Graph Transducer and Gaussian Fields Approach, are proposed to take the task of detecting unknown attacks. A special data set is designed to test the capabilities of all classification methods (both supervised and semi-supervised versions) for detecting unknown attacks. Experiments of comparing them with other seven traditional supervised learning methods (eleven versions) are presented. Results show that their performances are much better than those of the other seven traditional supervised learning methods on detecting unknown attacks. Secondly, one semi-supervised clustering method—MPCKmeans is introduced to improve purely unsupervised clustering methods on intrusion detection. Experiment is presented the comparison between two versions of MPCK-means and unsupervised learning method—K-means. The result shows that performance of MPCK-means (both two versions) is much better than that of K-means. Finally, after analyzing the results of experiments carefully and deeply, we propose a potential learning method—transfer learning, which may be more adept in detecting unknown attacks with the aid of accumulated records (training examples) of known attacks. We will report our work on performing
SMC 2008
transfer learning methods to detect unknown attacks in another paper. ACKNOWLEDGMENT The research work described in this paper was supported by grants from the National Natural Science Foundation of China (Project No. 10601064, 70531040, 70621001). REFERENCES [1]
[2] [3]
[4]
[5] [6]
[7]
[8]
[9] [10]
[11]
[12]
[13]
[14]
[15]
[16]
[17] [18]
A. Lazarevic, L. Ertöz, V. Kumar, et al, “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection,” In Proc. of the 3rd SIAM International Conference on Data Mining, San Francisco, CA, USA, May 1-3, 2003. Successful Real-Time Security Monitoring, Riptech Inc. white paper, Sep. 2001. Symantec.com, “Symantec internet security threat report highlights rise in threats to confi-dential information,” Available at: http://www.symantec.com/press/2005/n050321.html Accessed (2007) P. Laskov, P. Dussel, C. Schafer, et al., “Learning Intrusion Detection: Supervised or Unsupervised?” In Proc. of Image Analysis and Processing - ICIAP 2005, 13th International Conference, pp. 50-57, 2005. R. Bace, P. Mell, “NIST special publication on intrusion detection systems,” National Institute of Standards and Technology, 2001. E. Eskin, A. Arnold, M. Prerau, et al., “A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data,” In Proc. of Applications of Data Mining in Computer Security, Kluwer, 2002. S.J. Stolfo, F. Wei, W. Lee, et al., “KDD Cup – knowledge discovery and data mining competition”, 1999. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. R. Lippmann, R.K. Cunningham, D.J. Fried, et al., “Results of the DARPA 1998 offline intrusion detection evaluation,” In Proc. of RAID 1999, 1999. http://www.ll.mit.edu/IST/ideval/pubs/1999/RAID 1999a.pdf. R. Lippmann, J.W. Haines, D.J. Fried, et al., “The 1999 DARPA off-line intrusion detection evaluation,” Computer Networks, 34:579–595, 2000. W. Lee, S. Stolfo, “A framework for constructing features and models for intrusion detection systems,” In ACM Transactions on Information and System Security, 3:227–261, 2001. P. Laskov, C. Schafer, I. Kotenko, et al., “Intrusion detection in unlabeled data with quarter-sphere support vector machines (extended version),” Praxis der Informationsverarbeitung und Kommunikation, 27: 228–236, 2004. L. Portnoy, E. Eskin, S. Stolfo, “Intrusion detection with unlabeled data using clustering,” In Proc. ACM CSS Workshop on Data Mining Applied to Security, 2001. A.J. Butte, I.S. Kohane, “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements,” In Proc. Pacific Symposium on Biocomputing, pp. 415–426. Hawaii, 2000. S. Mukkamala, G. Janoski, A. Sung, “Intrusion detection using neural networks and support vector machines,” In Proc. of IEEE Internation Joint Conference on Neural Networks, pp. 1702–1707, 2002. W. Lee, S. Stolfo, K. Mok, “A data mining framework for building intrusion detection models,” In Proc. IEEE Symposium on Security and Privacy, pp. 120–132, 1999. R.F. Bie, X. Jin, C.L. Chen, et al., “Meta Learning Intrusion Detection in Real Time Network,” In Proc. of 17th International Conference, Porto, Portugal, pp. 809-816, 2007. J. McHugh, A. Christie, J. Allen, “Defending yourself: The role of intrusion detection systems,” IEEE Software, pp. 42–51, Sept./Oct. 2000. T. Lane, “A Decision-Theoretic, Semi-Supervised Model for Intrusion Detection,” University of New Mexico Technical Report TR-CS-200416, 2004.
[19] J. Aslam, S. Bratus, V. Pavlu, “Semi-supervised Data Organization for Interactive Anomaly Analysis,” In Proc. of the 5th International Conference on Machine Learning and Applications, pp. 55-62, 2006. [20] X. Zhu, “Semi-supervised learning literature survey,” Tech. Report 1530, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, 2006. [21] T. Joachims, “Transductive Learning via Spectral Graph Partitioning,” In Proc. of the 20th International Conference on Machine Learning (ICML-2003), Washington DC, 2003. [22] X. Zhu, Z. Ghahramani, J. Lafferty, “Semi-supervised learning using Gaussian fields and harmonic functions,” In Proc. the 20th International Conference on Machine Learning (ICML-2003), Washington DC, 2003. [23] M. Bilenko, S. Basu, R.J. Mooney, “Integrating Constraints and Metric Learning in Semi-Supervised Clustering,” In Proc. the 21st International Conference on Machine Learning, pp. 81-88, Banff, Canada, July, 2004. [24] R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers, San Mateo, CA., 1993. [25] L. Breiman, “Random Forests,” Machine Learning, 45:5-32, 2001. [26] C. Lee, P. Chung, J. Tsai, et al., “Robust Radial Basis Function Neural Networks,” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 29, No. 6, Dec. 1999. [27] C. Cortes, V. Vapnik, “Support-vector Networks,” Machine Learning, 20(3):273-297, 1995. [28] C. Kemp, T. Griffiths, S. Stromsten, et al., “Semi-supervised learning with trees,” Advances in Neural Information Processing System 16, 2003. [29] V. Sindhwani, P. Niyogi, M. Belkin, “Beyond the point cloud: from transductive to semi-supervised learning,” In Proc. of 22nd International Conference on Machine Learning, 2005. [30] D. Zhou, O. Bousquet, T. Lal, et al., “Learning with local and global consistency,” Advances in Neural Information Processing System 16, 2004. [31] K. Wagstaff, C. Cardie, S. Rogers, et al., “Constrained K-Means clustering with background knowledge,” In Proc. of 18th International Conference on Machine Learning, pp. 577–584, 2001.
SMC 2008