Qualitative classification and evaluation in possibilistic decision trees Nahla Ben Amor
Salem Benferhat
Zied Elouedi
Institut Sup´erieur de Gestion de Tunis, 41 Avenue de la libert´e, 2000 Le Bardo, Tunis, Tunisia E-mail:
[email protected] CRIL - CNRS, Universit´e d’Artois, Rue Jean Souvraz SP 18 62307 Lens, Cedex France E-mail:
[email protected] Institut Sup´erieur de Gestion de Tunis, 41 Avenue de la libert´e, 2000 Le Bardo, Tunis, Tunisia E-mail:
[email protected] Abstract— This paper presents a method for classifying objects in an uncertain context using decision trees. Uncertainty is related to attributes’ values of objects to classify and is handled in a qualititative possibilistic framework. Then, an evaluation method to judge the classification efficiency, in an uncertain context, is proposed.
I. I NTRODUCTION Decision trees are efficient methods used in classification problems. They consist of decision nodes for testing attributes, edges for branching attribute values and leaves for labeling classes [9], [7]. The decision tree technique is composed of two major procedures [2], [11]: 1) Building the tree: A decision tree can be built based on a given training set. It consists in finding for each decision node the ‘appropriate’ test attribute by using an attribute selection measure and also to define the class labeling each leaf satisfying one of the stopping criteria. 2) Classifying objects: We start by the root of the decision tree, then we test the attribute specified by this node. According to the result of the test, we move down the tree branch relative to the attribute value of the given object. This process will be repeated until a leaf is encountered. This leaf is labeled by a class. As pointed out in several researches [1], [4], [5], [6], [10], the classical methods of induction of decision trees do not deal with uncertain data. Ignoring it can affect the value of the results of classification. In order to adapt decision trees to uncertainty and imprecision, we first propose different manners to classify objects with uncertain/missing attributes using qualitative possibility theory. Then, we propose a criterion allowing to judge the efficiency of the classifier in an uncertain context. We illustrate our approach with a same running example from an intrusion detection system area. The paper will be organized as follows: Section 2 presents an overview of the possibility theory. Section 3 recalls the basics of possibilistic decision trees. In Section 4, we describe our leximin/leximax classification in possibilistic decision trees. In Section 5, the evaluation of the classification efficiency of possibilistic decision trees will be detailed.
II. P OSSIBILITY THEORY This Section gives a brief recalling on possibility theory (for more details see [3]). Uncertainty is here assumed to be represented qualitatively by a finite and totally ordered scale denoted by such that . If is a set of uncertainty degrees, we define (resp. ) such that and such that (resp. ). The basic concept of possibility theory, when uncertainty is represented qualitatively, is the notion of Qualitative Possibility Distribution (QPD), simply denoted by . A QPD is a function which associates to each element of the universe of discourse an element from , ( encodes our beliefs on a real world). By convention, means that it is completely possible is the real world, means that cannot be the real world, and means that is at least as possible as to be the real world. A QPD is said to be normalized if there exists at least one state which is totally possible (i.e. ). by: We define the possibility measure of any event (1) This measure evaluates at which level our knowledge represented by .
is consistent with
III. P OSSIBILISTIC DECISION TREES In this section, we do not detail the construction of decision trees which is based on a given training set where attribute values and classes are defined precisely (for more details see [11]). We are rather interested on how to classify objects characterized by uncertain attributes values where the uncertainty is presented by qualitative possibility distributions. We assign for each attribute a possibility distribution expressing the uncertainty in a qualitative way by encoding it in . the interval be different attributes of the problem. The Let instance to classify is described by a vector of possibility . An attribute is precisely distributions
defined if there exists exactly one value such that , and for all other values . A missing data regarding an attribute , is represented by a uniform possibility distribution (i.e.,
TABLE I P OSSIBILITY DISTRIBUTIONS ON http domain u private
1 , 1
1
SF REJ RSTO
1 1
In standard possibility theory, the basic operators min/max are used in order to choose the more plausible path in the tree. At first, we should compute the possibility degree of each path (from a root to a leaf class) by applying the minimum operator on its attributes values. Then, the most plausible path is the one presenting the highest possibility degree, in other words, we should apply the maximum operator on these paths’ degrees. Hence the class of the object to classify is the one labeling the leaf corresponding to this path. Example 1: In order to illustrate the different notions presented in this paper, we will consider an example in the intrusion detection field where we handle formatted connections corresponding to a TCP/IP dump rows. Note that, for the sake of simplicity, each connection is described by only four attributes which are: service, flag, count, wrong fragment. The attributes are defined as follows: , , , . We also handle three classes: ; where Normal corresponds to a normal connection while DOS and Probing are relative to two categories of attacks. Service http
According to the maximum operator, the most plausible paths are 3 and 9, thus the connection will be classified as Probing or DOS with a possibility degree 1. It is clear that the use of the maximum operator makes it and difficult to choose between the equally plausible paths .
private
domain u
IV. L EXIMIN / LEXIMAX CLASSIFICATION IN POSSIBILISTIC count
)
SF
RSTO
REJ
wrong fragment count wrong fragment
N. (
DECISION TREES
flag
P. (
) N. (
Fig. 1.
P. (
D. )
(
D. )
(
)
(
N. P. )( )
)
N. (
)
Example of decision tree in intrusion detection field
Assume that the connection to classify is with the possibility distributions given in Table I. According to the decision tree (see Figure 1) we have nine paths, then applying the minimum operator on the different degrees relative to each path, we get:
The min-max combination mode is not satisfactory since it is somewhat cautious which makes the number of candidate classes important especially when the number of missing attributes is important. Furthermore, min/max operators are not discriminatory. Indeed, we can check that, for any attribute , and for any value of such that , by does not change the selected candidate replacing classes. are normalized, This is explained by the fact that if then there exists at least one path from the root to a leaf class such that the possibility degree of each node’s value in this path is equal to 1. Hence, with min/max combination mode, only paths where possibility degrees of attributes values are equal to 1 are considered. One idea to overcome drawbacks of the min/max combination is to extend these two operators by using leximin and leximax criteria which are natural extensions of the minimum and maximum operators used in the qualitative
setting [8] defined by: and Definition 1: !!expliquer be two vectors, and let and be two permutations of indices such that and . Then, is said to be leximin-preferred (resp. leximax, denoted by (resp. preferred) to ), if and only if there exists such and (resp. that ). is said to be leximin-equal (resp. leximax-equal) to , denoted by (resp. ), if and only if , . Let be the set of all different paths from the root to leaves. For each class , we associate a vector containing paths having as a leaf classified in a leximinorder. To apply this criterion, all paths should be described by the same attributes already defined in the training set. However, since paths are pruned, the idea will be to assign a degree 1 to the missing values. The justification of adding the degree 1 can be explained as follows: In some path , a class is obtained without an attribute , this in fact means that can be obtained can be independently of the value of . In other terms, obtained from a path composed by and the most (namely those having the degree 1 plausible instance of since only normalized distributions are considered). Definition 2: Let and be two vectors relative to paths leading to and . Let and be two permutations of indices such that and . is said to be leximin-leximax preferred to , denoted by , - if there exists such that and . - or if , and (i.e. is supported by a greater number of paths than ) . is said to be leximin-leximax equal to , denoted by , if and only if and , . be a set of classes, the Definition 3: Let class is leximin-leximax preferred iff there is no class , such that . The selection mode based on the leximin/leximax operators proceeds in two steps: 1) Establish a total pre-order of all paths using leximin operator. Then, select a first set of candidate classes
corresponding to leaves of best paths in the total preorder. Let be this set of classes. 2) Refine by selecting its leximax-preferred class(s) using Definition 3. Example 2: Let us continue the previous example. According to the leximin criterion, we get the following total preorder between different paths:
Thus which are the classes labeling and , respectively. In other terms the connection will be classified as a Probing or a DOS attack. Then, since and . Thus, it is possible to have a more precise result and the connection will be classified as a Probing attack. V. E VALUATION OF P OSSIBILISTIC DECISION TREES When dealing with an uncertain context, the evaluation of a classifier namely a possibilistic decision tree is not so obvious. A. Percent of Correct Classification In the classical case, the corresponds to the proportion of the number of well classified objects on the whole number of objects. However, as within a possibilistic decision tree, a new object may not be classified in a unique class, it will be necessary to to the uncertainty pervading classes. Thus, the adopt the idea is to choose for each object to classify the class having the highest degree of possibility degree. If more than one class is obtained, then one of them is chosen randomly. The obtained class is considered as the class of the testing object. relative to the whole testing set is Hence, the computed by making comparison, for each testing instance, between its real class (known by us) and the class obtained by the induced tree. number of well classified objects number of testing objects
(2)
where the number of well classified objects is computed as the sum of testing objects for which the class obtained by the possibilistic decision tree (the most plausible class) is the same as their real class. B. Distance criterion The limitation of the adaptative is that it ignores the order that exists between the different classes and that may correspond to the chosen class. It only considers the most plausible class. So, we propose a criterion allowing to take into account the order of the classes characterizing the object to classify. More exactly, we propose to compare the ranking assigned to classes with the real class of the given testing object. Such comparison is based on a kind of distance. At first, we should define a qualitative possibility distribuassigned to the object as follows: tion
Assume we handle n classes (
,
,
,
), then:
Assume that the real class of the object DOS (D.). Using the Equation 4, we get:
is the attack
if if does not appear in the order of the classes relative to the object to classify (3) Where represents the decreasing classing of over the other . Next, we define the distance criterion for a testing object (where its possibility distribution is ) with respect to its as follows: real class denoted
(4) where
if otherwise
This distance verifies the following property: (5) is close to 2, the classifier is bad, whereas when when it falls to 0, it is considered as a good classifier. In order to give to this distance a signification more close to , we propose to make changes on (and it will the be denoted ), so as it satisfies the following property: (6)
(7) Next, we have to compute the average total distance relative . So, to all the classified testing instances denoted we get: classified objects number of classified objects will be considered as a calibrated Thus, the whole testing set.
(8) on
C. Example Let’s continue with our example where we deal with three classes namely , and . To classify the connection given in Example 2, we get (according to the induced tree) the fol. lowing order: Hence, the corresponding possibility distribution (see Equation 2) will be:
Then, we get the distance
:
So, Thus, we get 39% of chance that the induced tree detects the real class of whereas, applying directly the classical PCC on the most plausible class leads directly to an erroneous result. Obviously, we can apply this distance criterion on all the testing set using Equation 8. VI. C ONCLUSION In this paper, we have presented two contributions. The first one concerns the classification, using decision trees, of objects characterized by uncertain attribute values where uncertainty is represented in a qualititative possibilistic framework. Indeed, to overcome limitations of the standard min-max combination, we have proposed a lexmin/leximax combination mode in the classification phase. In the second part, we have proposed a new criterion to judge the efficiency of classifiers under an uncertain context, namely the qualitative possibilistic decision trees. This criterion takes into account the total pre-order of classes relative to each testing instance and not only the best one as in the ). classical Percent of Correct Classification ( A future work will be to introduce a semantic distance to this criterion allowing to adjust the degree of similarity between classes. R EFERENCES [1] Ben Amor N., Benferhat S., Elouedi Z., Mellouli K.: Decision Trees and Qualitative Possibilistic Inference: Application to the Intrusion Detection Problem. Proceedings of European Conference of Symbolic and Quantitative Approaches to Reasoning and Uncertainty (ECSQARU’2003), 419431, 2003. [2] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.: Classification and regression trees. Monterey, CA, Wadsworth & Brooks, 1984. [3] D. Dubois and H. Prade: Possibility theory: An approach to computerized. Processing of uncertainty. Plenium Press, New York, 1988. [4] Denoeux T., Skarstein-Bjanger M.: Induction of decision trees for partially classified data. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Nashville, USA, 2923-2928, 2000. [5] Elouedi Z., Mellouli K., Smets P.: Belief decision trees: Theoretical foundations. International Journal of Approximate Reasoning 28, 91-124, 2001. [6] Hullermeier E., Possibilistic induction in decision-tree learning, ECML’02, 2002. [7] Mitchell, T. M.: Decision tree learning. Chapter 3 of Machine Learning, Co-published by the MIT Press and the McGraw-Hill Compagnies, Inc., 1997.
[8] Moulin H.: Axioms for cooperative decision-making. Cambridge University Press, 1988. [9] Quinlan, J. R.: Induction of decision trees, Machine Learning 1, 1-106, 1986. [10] Quinlan, J. R.: Probabilistic decision trees, Machine Learning, Vol. 3, Chap. 5, Morgan Kaufmann, 267-301, 1990. [11] Quinlan, J. R.: C4.5: Programs for machine learning. Morgan Kaufmann San Mateo Ca, 1993.