Coarsening Classification Rules on Basis of Granular Computing

Report 2 Downloads 58 Views
Proceedings of the 17th Canadian Conference on Artificial Intelligence (CAI’04), 578-579, 2004.

Coarsening Classification Rules on Basis of Granular Computing Yan Zhao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: [email protected]

1

Problem statement

The task of classification, as a well-studied field of machine learning and data mining, has two main purposes: describing the classification rules in the training dataset and predicting the unseen new instances in the testing dataset. Normally, one expects a high accuracy for precisely descriptive use. However, a higher accuracy is intended to generate longer rules, which are not easy to understand, and may overfit the training instances. This motivates us to propose the approach of coarsening the classification rules, namely, reasonably sacrifice the accuracy of description in a controlled level in order to improve the comprehensibility and predictability. The framework of granular computing provides us a formal and systematic methodology for doing this. In this paper, a modified PRISM classification algorithm is presented based on the framework of granular computing.

2

The formal concepts of granular computing

The formal general granular computing model is summarized in [2]. Briefly, all the available information and knowledge is stored in an information table. The definable granules are the basic logic units for description and discussion use. The refinement, or coarsening relationship between two definable granules is a partial order, which is reflexive, asymmetric and transitive. For the tasks of classification, we are only interested in conjunctively definable granules, which mean that more than one definable granule are conjunctively connected. Partition and covering are two commonly used granulations of universe. One can obtain a more refined partition by further dividing equivalence classes of a partition. Similarly, one can obtain a more refined covering by further decomposing a granule of a covering. This naturally defines a refinement (coarsening) order over the partition lattice and the covering lattice. Approaches for coarsening decision trees include pre-pruning methods and post-pruning methods. For pre-pruning methods, the stopping criterion is critical to the classification performance. Too low a threshold can terminate division too soon before the benefits of subsequent splits become evident; while too high a threshold results in little simplification. Post-pruning methods engage a nontrivial post process after the complete tree has been generated. Pre-pruning

Proceedings of the 17th Canadian Conference on Artificial Intelligence (CAI’04), 578-579, 2004.

methods are more time-efficient than post-pruning methods. Approaches for coarsening decision rules only include pre-pruning methods, which have the same advantages and disadvantages as they are used for decision trees.

3

The modified-PRISM algorithm

The PRISM algorithm is proposed as an algorithm for inducing modular rules [1] with very restrictive assumptions. We modify the PRISM algorithm to get a set of coarsening classification rules by using a pre-pruning method. For each value d of the decision attribute: 1. Let χ = “”. 2. Select the attribute-value pair αx for which p(d|αx ) is the maximum. Let χ = χ ∧ αx . Create a subset of the training set comprising all the instances which contain the selected αx . 3. Repeat Steps 1 and 2 until p(d|χ) reaches the threshold, or no more subsets can be extracted. 4. If threshold = 1 4.1: Remove the d-labeled instances covered by χ from the training set. 4.2: Repeat Step 1-4.1 until all d-labeled instances have been removed. Else /* for coarsened rules 4.1’: Remove the attribute-value pairs used for χ from consideration. 4.2’: Repeat Step 1-4.1’ until no more attribute-value pairs are left. In Step 2, if there is only one αx for which p(d|αx ) is the maximum, then αx is of course selected. If there are more than one αx that is the maximum at the same time, then the one, conjuncted with the previous χ, that covers a larger number of instances is selected. When the accuracy threshold is one, then the original PRISM algorithm is applied, (Step 4.1 and 4.2). When the accuracy threshold is less than one, we cannot simply remove the d-labeled instances from the training set. Instead, we remove the “used” attribute-value pairs from the consideration. This greedy method is stated in Step 4.1’ and 4.2’.

4

Conclusion

The purposes of coarsening classification rules are for a better understanding of the classification rules and a higher accuracy for prediction. By putting the classification in the granular computing framework, we can study the problem formally and systematically. We modify the existing PRISM algorithm to coarsen the classification rules.

References 1. Cendrowska, J. PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27, 349-370, 1987. 2. Yao, Y.Y. and Yao, J.T. Granular computing as a basis for consistent classification problems, Proceedings of PAKDD’02, 101-106, 2002.

2