Learning to Rank with Bayesian Evidence Framework

Report 11 Downloads 47 Views
Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

Learning to Rank with Bayesian Evidence Framework 1

Zhang Yan, 2 Li Zhoujun, 3 Ma Dianfu, 4 Xiong Zenggang School of Computer Science and Engineering, BeiHang University,China, Hubei University, Faculty of Mathematics and Computer Science, China ,[email protected] 2,3 School of Computer Science and Engineering, BeiHang University,China, [email protected], [email protected] 4,Corresponding Author School of Computer and Information Science, Xiaogan University, China, xzg@ csnet1.cs.tsinghua.edu.cn 1,First Author

Abstract The problem of ranking has recently gained attention in data learning. The goal ranking is to learn a real-valued ranking function that induces a ranking or ordering over an instance space. In this paper, we apply popular Bayesian techniques on ranking support vector machine. We propose a novel differentiable loss function called trigonometric loss function with the desirable characteristic of natural normalization in the likelihood function, and then follow standard Gaussian processes techniques to set up a Bayesian framework. In this framework, Bayesian inference is used to implement model adaptation, while keeping the merits of ranking SVM. Experimental results on data sets indicate the usefulness of this approach.

Keywords: Machine Learning, Ranking, SVM, Trigonometric Loss function 1. Introduction Two decades ago, Valiant [1] proposed a theory of learnability for binary classification functions defined on Boolean domains. His Probably Approximately Correct (PAC) learning models have since been studied extensively, and have led to a rich set of theoretical results on classes of functions. Recently, a new learning problem, namely that of ranking, has gained attention in the learning community [2– 4]. In ranking, one learns a real-valued function that assigns scores to instances, but the scores themselves do not matter; instead, what is important is the relative ranking of instances induced by those scores. The goal of this problem is to learn a real-valued ranking function that induces a ranking or ordering over an instance space. This problem is distinct from both classification and regression, and it is natural to ask whether a similar theoretical understanding can be developed for this problem. Moreover, in many data mining applications, the classifier‟s accuracy are not enough, because they cannot express the information how “far-off” is the prediction of each example from its target. To accomplish these tasks, we need more than a mere classification of buyers and non buyers. We often need a ranking of customers in terms of their likelihood of buying. Thus, a ranking is more desirable than just a classification. In this paper, we introduce a novel loss function for ranking SVM, which called the trigonometric loss function with the purpose of integrating Bayesian inference with ranking SVM smoothly. The trigonometric loss function is smooth and naturally normalized in likelihood evaluation. Further, it possesses the desirable property of sparseness in sample selection. We follow standard Gaussian processes to set up a Bayesian framework. Experimental results on data sets indicate the usefulness of this approach. The rest of this paper is organized as follows. We review some related work in Section 2. Section 3 introduces the proposed ranking model. We first describe the basic concept of ranking SVM, and then describe the key factors of the new ranking model based on Bayesian design method. We evaluate our approach and analyze the experiments results in Section 4. Section 5 concludes the paper.

2. Related Work

Advances in information Sciences and Service Sciences(AISS) Volume3, Number8, September 2011 doi : 10.4156/AISS.vol3.issue8.36

290

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

In learning to rank a number of categories are given and a total order is assumed to exist over the categories. Labeled instances are provided. Each instance is represented by a feature vector, and each label denotes a rank. Existing methods fall into two categories. They are referred to in this paper as “point-wise training” and “pair-wise training”. In point-wise training, each instance (and its rank) is used as an independent training example. The goal of learning is to correctly map instances into intervals. Crammer & Singer [4] propose a ranker based on the perceptron („Prank‟), which maps a feature vector x ∈Rd to the real with a learned w ∈ Rd such that the output of the mapping function is just w · x. PRank also learns the values of N increasing thresholds1 br = 1, …, N and declares the rank of x to be minr {w · x - br < 0}. PRank learns using one example at a time, which is held as an advantage over pair-based methods (e.g. Freund et al. [6]), since the latter must learn using O (m2) pairs rather than m examples. Harrington [7] has proposed a simple but very effective extension of PRank, which approximates finding the Bayes point by averaging over PRank models. RankProp [5] is a neural net ranking model. RankProp alternates between two phases: an MSE regression on the current target values, and an adjustment of the target values themselves to reflect the current ranking given by the net. The end result is a mapping of the data to a large number of targets which reflect the desired ranking, which performs better than just regressing to the original, scaled rank values. In pair-wise training each instance pair is used as a training example and the goal of training is to correctly find the differences between ranks of instance pairs. Herbrich et al. [3] cast the problem of learning to rank as ordinal regression - learning the mapping of an input vector to a member of an ordered set of numerical ranks. They model ranks as intervals on the real line, and consider loss functions that depend on pairs of examples and their target ranks. The positions of the rank boundaries play a critical role in the final ranking function. RankBoost (Freund et al. [6]) is another ranking algorithm that is trained on pairs, and which is closer in spirit to our work since it attempts to solve the preference learning problem directly, rather than solving an ordinal regression problem. In [6], results are given using decision stumps as the weak learners. The cost is a function of the margin over reweighted examples. Dekel et al. [8] provide a very general framework for ranking using directed graphs, where an arc from A to B means that A is to be ranked higher than B. This approach can represent arbitrary ranking functions, in particular, ones that are inconsistent. Joachims [9] proposed learning a ranking function for search as ordinal regression using clickthrough data. He employs what he calls the Ranking SVM model for ordinal regression. And lately Cao et al. [10] adapt this method to document retrieval. Our method is based on “pair-wise training” method, and using Bayesian design method to extend Ranking SVM.

3. Ranking Model Based on Bayesian Design Method 3.1 Ranking SVM Ranking SVM [9] is a method which formalizes learning to rank as learning for classification on pairs of instances and tackles the classification issue by using SVM. n In formally, assume that there exists an input space X  R , where n denotes number of features. There exists an output space of ranks (categories) represented by labels Y  {r1 , r2 ,   , rq } where q denotes number of ranks. Further assume that there exists a total order between the ranks rq rq 1    r1 , where denotes a preference relationship. A set of ranking functions f ∈F exists and each of them can determine the preference relations between instances:

xi

x j  f ( xi )

f (xj )

(1)

Suppose that we are given a set of ranked instances S  {( xi , yi )}i 1 from the space X t

Y . The task

here is to select the best function f’ from F that minimizes a given loss function with respect to the given ranked instances. Herbrich et al. [3] propose formalizing the above learning problem as that of learning for classification on pairs of instances.

291

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

First, we assume that f is a linear function.

f w ( w, x ) where obtain

(2)

w denotes a vector of weights and stands for an inner product. Plugging (2) into (1) we xi

x j  w, xi  x j  0

(3)

Note that the relation xi x j between instance pairs xi and x j is expressed by a new vector xi  x j . (1) Next, we take any instance pair and their relation to create a new vector and a new label. Let x and (2) (1) (2) x denote the first and second instances, and let y and y denote their ranks, then we have

1, y (1)  x  x , z  , z   (1) 1, y (1)

(2)

y

(2)

y

(2)

(4)

From the given training data set S, we create a new training data set S' containing m labeled vectors.

S  {xi  xi , zi }i 1 '

(1)

m

(2)

(5)

Next, we take S’ as classification data and construct a SVM model that can assign either positive la(1) ( 2) bel z = +1 or negative label z = -1 to any vector x  x . Constructing the SVM model is equivalent to solving the following Quadratic Optimization problem [9]: m

min  1  zi w, xi w

(1)

 xi

( 2)

i 1

   w

2

(6)

The first term is the so-called empirical Hinge Loss and the second term is regularizer. * * Suppose that w is the weights in the SVM solution. Geometrically w forms a vector orthogonal to * the hyperplane of Ranking SVM. We utilize w to form a ranking function f w* for ranking instances.

fw

*

x 

*

w ,x

(7)

When Ranking SVM is applied to document retrieval, an instance (feature vector) is created from one query-document pair. Each feature is defined as a function of query and document.

3.2 Trigonometric Loss Function In the probabilistic approach for binary classification, logistic function is widely used as an approximation for the discontinuous heaviside step function in likelihood evaluation [11]. The logistic function is defined as

P  yx | f x  

1 1  exp   y x  f x 

(8)

where the input vector x ∈Rn, the class label yx ∈ {+1, -1} and fx denotes the latent function (discriminant function) at x. –lnP(yx|fx) is usually referred to as loss function. The loss function associated with the shifted heaviside step function in classical SVM is also called hard margin loss function, which is defined as:

0,

lh  y x , f x   

if y x  f x  1

, otherwise

(9)

292

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

The hard margin loss function is suitable for noise-free data sets. For other general cases, a soft margin loss function is popularly used in classical SVC, which is defined as:



0,

l  y x , f x   

if y x  f x  1

 (1  yx , f x ), otherwise

(10)

where ρ is a positive integer. The corresponding likelihood function in probabilistic framework could be written as:

P  yx | f x  

1 v( f x )

 exp  l  y x  f x  

(11)

where yx ∈ {+1, -1}. Notice that the normalizer v (fx) is dependent on the latent function fx. This flaw precludes the solution of SVM from being directly used as the MAP estimate on function values in Bayesian inference [12]. The loss functions in SVM are special in that they give identical zero penalty to training samples that have satisfied the constraint yx · fx > +1. These training samples are not involved in the Bayesian inference computations. The simplification of computational burden is usually referred to as the sparseness property. Logistic function (1) does not enjoy this property since it contributes a positive penalty to all the training samples. On the other hand, logistic function is attractive because it is naturally normalized in likelihood evaluation, i.e., the normalizer is a constant, a property that allows Bayesian techniques to be used smoothly. Based on these observations, we generalize the desirable characteristics in these loss functions for classification: it should be naturally normalized in likelihood evaluation; it should possess a flat zero region that results in sparseness property; it should be smooth and its first order derivative should be explicit and simple. Adhering to these requirements, we propose a novel loss function for binary classification, known as trigonometric loss function. The trigonometric loss function is defined as

, if    , 1      lt      2 ln sec  1     if    1, 1 4     0 if    1,   

(12)

where δ = yx · fx . The trigonometric likelihood function is therefore written as ,     Pt  yx | f    cos 2  1     4    0 

if    , 1 if    1, 1

(13)

if    1,  

The derivatives of the loss function are needed in the implementation of Bayesian methods. The first order derivative of (8) with respect to fx can be derived as

lt    f x

      yx tan  1     if    1, 1  2 4   0 if    1,   

(14)

and the second order derivative is

293

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

 lt    2

f

2 x

2 2    sec  1     if    1, 1  8 4   0 if    1,   

(15)

From the definition above, it is easy to see that the normalizer v (fx) is a constant for any fx and we can also find that the trigonometric loss function possesses a flat zero region that is same as the loss functions in classical SVM, but it requires that yx · fx > -1 should always hold.

3.3 Bayesian Inference The evidence framework proposed in this paper is based on Bayesian framework proposed by MacKay [16]. Computationally, it is equivalent to the type II maximum likelihood method in Bayesian statistics. The evidence framework has been applied successfully to the learning of feed-forward neural networks in both classification and regression problems. We also use evidence framework in our ranking model. Substituting the trigonometric loss function for the loss function in Ranking SVM, the trigonometric Ranking SVM (TRSVM) minimizes the following regularized functional in a RKHS n

min R  f    lt  y x 4  f x 4    f

f RKHS

2

(16)

RKHS

i 1

2

where the regularization parameter λ is positive and f is a norm in the RKHS. The function f could be also explained as a family of random variables in a Gaussian process due to the duality between RKHS and stochastic processes. Recently, Gaussian processes have provided a promising nonparametric Bayesian approach to classification problems [11] which can be also adapted in ranking. The important advantage of Gaussian process models over other non-Bayesian models is the explicit probabilistic formulation. This not only builds the ability to infer model parameters in Bayesian framework but also provides probabilistic class prediction. We follow the standard Gaussian process classifier to describe a Bayesian framework, in which we impose a Gaussian process prior distribution on the latent functions and employ the trigonometric loss function in likelihood evaluation. This ranking machine, TRSVM in the Bayesian framework, is referred to as Bayesian TRSVM (BTR-SVM). The function f is usually assumed as the realizations of random variables indexed by the input vector xi in a stationary zero-mean Gaussian stochastic process. The Gaussian process can then be specified by giving covariance matrix for any finite set of zero-mean random variables {f (xi) | i = 1, 2, …, n}. The covariance between the outputs corresponding to the inputs xi and xj could be defined as 2   1 Cov  xi , x j   k0 exp   k xi  x j   kb (17)  2  RKHS

where k0 >0 and k > 0. With ARD parameters, Gaussian covariance function could be enhanced as

 1

Cov  xi , x j   k0 exp  

 2

d



 ki x  x i 1

   k 2

i

i

i

j



b

(18)

where xi denotes the i-th entry of the input vector x, and ki is the ARD parameter and it determines the relevance of the i-th input dimension to the target. We collect the parameters in the prior distribution, as µ, the hyperparameter vector. Thus, for a given hyperparameter vector µ, the prior probability of the random variables {f (xi)} is a multivariate Gaussian.

4. Experiments and Results

294

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

4.1 Experiments configuration The computer we used for these numerical experiments is P4 2.0G PC with 512MB RAM and the operating system is Windows XP. We carried out Bayesian inference with Gaussian kernel on our data sets. All algorithms are implemented within the Weka framework [19].

4.2 Parameters chosen First, we apply our methods described in the previous section to artificial data in order to analyze the characteristics of these methods and select best parameters. The data sets consisted of 1000 training and 1000 test examples. The training examples were labeled with a subset of the complete set of pair-wise preferences as imposed by the ranking in the data set. The subsets that were selected for the experiments are described one by one for the experiments. In numerical experiments, the initial states of the hyperparameters are chosen as kb = 100 and k = 1/d, where d is the input dimension. The initial value of k0 is chosen from {0.1, 1, 10, 100}, usually it is 10 as the reason will be discussed lately. We used the proposed BTR-SVM in its default settings5 to learn a model. The predicted ranks were then compared with the actual ranks. Our primary evaluation measures were the error rate of the top rank (for comparing classifications) and the rank correlation coefficient (for comparing complete rankings). We report the training results of BTR-SVM on these data sets in Table 1. Table 1. Parameter selection in our ranking machine k0 prefs error Rank corr 0.1 0.1 0.1 1 1 1 10 10 10 100 100 100

Ranking Classification Complement Ranking Classification Complement Ranking Classification Complement Ranking Classification Complement

13.370 14.340 34.560 14.540 15.980 36.430 11.320 12.450 32.390 17.550 18.780 42.650

0.935 0.734 0.845 0.958 0.774 0.895 0.902 0.700 0.790 0.975 0.813 0.907

As showed on table 1, the performance is best when k0 =10. The results for the complementary setting show that the information of the top rank preferences is crucial: When dropping this information and using only those pairwise preferences that do not involve the top label, the error rate on the top rank increases considerably, and is much higher than the error rate for the classification setting. The optimal hyperparameters used throughout the training on all folds of that data set is determined by the average results of Bayesian inference on the first eight folds. We find that the generalization capability of our Bayesian approach is very competitive. Bayesian inference with Gaussian kernel yields more preferable evidence and reduces test error rate. On the reduced data sets, Gaussian covariance kernel can still yield similar performance. It is one of the advantages of Bayesian design over the deterministic approach that large number of hyperparameters can be tuned systematically.

4.3 Comparison with other approaches Based on the previous experiment, we conduct experiments to compare our algorithm (BTR-SVM) with a variety of other ranking techniques: decision- tree based ranking techniques (R-DT) [18], Naïve Bayesian based ranker (R-NB) [17] and ranking support vector machine (R-SVM) [10]. We use 10 datasets from the UCI repository [20], shown in Table 2.

295

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

Table 2. Description of the datasets used in the experiments Dataset Number Dataset sizes Number of attributes 1

Breast cancer

Missing value

286

9

Yes

2

Vote

435

16

Yes

3

Chess

3196

36

No

4

Mushroom

8124

22

Yes

5

Credit Approval

690

15

Yes

6

German Credit

1000

24

No

7

Ionosphere

351

34

No

8

Labor

57

16

No

9

Sick

3772

30

Yes

10

Sonar

208

61

No

AUC is used as the evaluation criterion in the experiments. The area under the Receiver Operating Characteristics curve, or simply AUC, has been used for this purpose and received a considerable attention. AUC compares the classifiers‟ performance cross the entire range of class distributions and error costs and is a good “summary” for comparing two classifiers. For binary classification, AUC is equivalent to the probability that a randomly chosen example of class − will have a smaller estimated probability of belonging to class + than a randomly chosen example of class +. They present a simple approach to calculating the AUC of a classifier G below.

R

S0  n0  n0  1 / 2

(19)

n0 n1

where n0 and n1 are the numbers of negative and positive examples respectively, and S0 =∑ri, where ri is the rank of ith positive example in the ranked list. It is clear that AUC is essentially a measure of the quality of a ranking. For example, the AUC of a ranking is 1 (the maximum value of AUC) if there is no positive example preceding a negative example. Table 3 shows the AUC scores of the algorithms on each data set, and the average AUC and standard deviation on all data sets are summarized at the bottom of the table.

Dataset Number 1

Table 3. Experimental results on AUC R-DT R-NB R-SVM

BTR-SVM

58.22±8.99

70.56±13.92

73.65±13.72

78.74±6.98

2

100±0.00

94.12±6.33

96.52±7.83

96.76±3.54

3

94.65±7.54

100±0.00

96.42±6.54

100±0.00

4

95.23±3.67

93..39±2.93

96.45±5.76

92.43±7.87

5

85.34±3.23

91.56±4.18

89.56±5.56

93.54±6.76

6

67.34±5.23

78.73±5.82

79.65±4.42

81.54±5.77

7

92.45±10.45

91.34±4.34

93.43±5.65

88.43±9.54

8

72.23±20.12

95.34±15.67

93.43±19.45

95.43±18.14

9

98.34±2.44

93.23±2.18

95.36±4.34

96.34±2.44

10

81.23±8.45

80.23±9.34

82.43±9.65

89.76±10.80

Average

84.50

88.85

89.69

91.30

From our experiments, we have the following observations: (1) BTR-SVM achieves a better performance compared to R-DT, R-NB and R-SVM. (6 wins and 4 losses for allover) (2) R-SVM achieves considerable improvement over R-NB in AUC (7 wins and 3 losses).

296

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

(3) R-SVM also performs very well than R-DT (8 wins and 2 losses). (4) R-DT does not achieve significant improvement over R-NB in AUC (5 wins and 5 losses). This indicates that representing the influence from only one attribute is not sufficient to produce accurate rankings.

5. Conclusions In this paper, we apply popular Bayesian techniques on ranking support vector machine. We propose a novel differentiable loss function called trigonometric loss function with the desirable characteristic of natural normalization in the likelihood function, and then follow standard Gaussian processes techniques to set up a Bayesian framework. In this framework, Bayesian inference is used to implement model adaptation, while keeping the merits of ranking SVM. The results in numerical experiments verify that this approach performs very well. Future work including apply Bayesian techniques on other learning method, such as decision tree or hierarchical model learning, and also can apply the proposed method in some applications, such as information retrieval, source code static analysis.

Acknowledgements This work is Supported by the National Natural Science Foundation of China under Grant Nos. 90604007, 980818021.

References [1] Leslie G. Valiant, “A theory of the learnable” ,Communications of the ACM, pp.1134-1142, 1984. [2] William W. Cohen, Robert E. Schapire, Yoram Singer, “Learning to order things”, Journal of Artificial Intelligence Research no.10, pp.243-270, 1999. [3] Ralf Herbrich, Thore Graepel, Klaus Obermayer, “Large margin rank boundaries for ordinal regression”, Advances in Large Margin Classifiers, pp.115-132, 2000. [4] Koby Crammer and Yoram Singer ,“Pranking with ranking. In Dietterich”, T.G., Becker, S., Ghahramani, Z., eds. , “Advances in Neural Information Processing Systems 14”, MIT Press, pp.641647, 2002. [5] R. Caruana, Shumeet Baluja, Tom Mitchell,“Using the future to \sort out" the present: Rankprop and multitask learning for medical risk evaluation”, In Proceedings of Advances in Neural Information Processing Systems , pp.959–965, 1996. [6] Yoav Freund,Raj Iyer, Robert E. Schapire, Yoram Singer ,“An efficient boosting algorithm for combining preferences”, Journal of Machine Learning Research, 4, pp.933–969, 2003. [7] Edward F. Harrington,“Online ranking/collaborative filtering using the Perceptron algorithm”, In Proceedings of Twentieth International Conference on Machine Learning 1 , pp. 250-257, 2003 . [8] Ofer Dekel, Christopher Manning, and Yoram Singer,“Loglinear models for label-ranking”, In Proceedings of Advances in Neural Information Processing Systems 16, pp.200-208,2004. [9] Thorsten Joachims, “Optimizing search engines using clickthrough data”, In Proceedings of 8th ACM SIGKDD International Conference Knowledge Discovery and Data Mining, Edmonton, Alberta, pp.133-142, 2002. [10] Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon,“Adapting Ranking SVM to Document Retrieval”, In Proceedings of the 29th Annual International ACM SIGIR Conference On Research and Development in Information Retrieval, pp.186-193, 2006. [11] Christopher K.I. Williams, David Barber,“Bayesian classification with Gaussian processes”,IEEE Transactions on Pattern Analysis and Machine Intelligence ,Vol 20, No.12, pp.1342-1351, 1998. [12] Peter Sollich,“Bayesian methods for Support Vector Machines: Evidence and predictive class probabilities”. In Proceedings of Machine learning, 46, pp.21-52, 2002. [13] Quinlan J. Ross ,“Programs for Machine Learning”, Morgan Kaufmann, San Mateo, CA, 1993.

297

Learning to Rank with Bayesian Evidence Framework Zhang Yan, Li Zhoujun, Ma Dianfu, Xiong Zenggang

[14] William Hersh, Chris Buckley, T. J. Leone, David Hickam. ,“An interactive retrieval evaluation and new large test collection for research”, In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp.192-201, 1994. [15] Ramesh Nallapati,“Discriminative models for information retrieval”, In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp.64-71, 2004 [16] David J. C. MacKay'. ,“Bayesian interpolation”, Neural Comput., Vol 4, No.3, pp.415-447, 1992 [17] Harry Zhang and Jiang Su,“Naive Bayesian classifiers for ranking”, In Proceedings of the 15th European Conference on Machine Learning (ECML2004), pp.1-12,2004. [18] Foster Provost and Pedro Domingos,“Tree Induction for Probability-Based Ranking”, Ma-chine Learning 52(3) 2003, pp.199-215, 2003. [19] Ian H. Witten and Eibe Frank ,“Data Mining - Practical Machine Learning Tools and Techniques with Java Implementation”, Morgan Kaufmann Publishers, USA,2000. [20] Merz, C. Jazz, Murphy, Newmam J. Daha,“UCI repository of machine learning databases”, Dept of ICS, University of California, Irvine, 1997. [21] Zeng-gang Xiong, Zheng-li Zhai, Xue-min Zhang, Xue-wen Xia, "Grid Workflow Service Composition Based on Colored Petri Net", JDCTA: International Journal of Digital Content Technology and its Applications, Vol. 5, No. 5, pp. 125 ~ 131. 2011 [22] Liu Tingting, Liu Guangli, Liu Tong, "Ontology-based Grain Emergency System", AISS: Advances in Information Sciences and Service Sciences, Vol. 3, No. 5, pp. 27 ~ 35, 2011. [23] Sheng Zhu, Jingjing Zhang, "European Option Pricing Model Based on Data Mining", AISS: Advances in Information Sciences and Service Sciences, Vol. 3, No. 4, pp. 21 ~ 28, 2011.

298