Attribute Reduction in Decision-Theoretic Rough Set Model: A Further Investigation Huaxiong Li1,2 , Xianzhong Zhou1 , Jiabao Zhao1 , and Dun Liu3 1
School of Management and Engineering, Nanjing University, Nanjing, Jiangsu, 210093, P.R. China 2 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, 210093, P.R. China 3 School of Economics and Management, Southwest Jiaotong University, Chengdu, 610031, P.R. China {huaxiongli,zhouxz,jbzhao}@nju.edu.cn,
[email protected] Abstract. The monotonicity of positive region in PRS (Pawlak Rough Set) and DTRS (Decision-Theoretic Rough Set) are comparatively discussed in this paper. Theoretic analysis shows that the positive region in DTRS model may expand with the decrease of the attributes, which is essentially different from that of PRS model and leads to a new definition of attribute reduction in DTRS model. A heuristic algorithm for the newly defined attribute reduction in DTRS model is proposed, in which the positive region is allowed to expand instead of remaining unchanged in the process of deleting attributes. Results of experimental analysis are included to validate the theoretic analysis and quantify the effectiveness of the proposed attribute reduction algorithm. Keywords: decision-theoretic rough set, attribute reduction, positive region, monotonicity, heuristic algorithm.
1
Introduction
In recent decades, rough set theory has proven to be an effective mathematical methodology to deal with vague or imprecise information. In Pawlak rough set model (PRS) [13], the set inclusion is required to be fully correct or certain, which makes the rough set approach to be very sensitive to the accuracy of input data and not suitable to deal with noisy data. Therefore, researchers develop PRS and propose probabilistic rough set models [2, 14, 15, 17, 22]. Based on a membership function of set inclusion with statistical information, Yao proposes decision-theoretic rough set model(DTRS) [17] and Ziarko proposes variable precision rough set model(VPRS) [22]. Both of the models introduce thresholds to control the inclusion degree of uncertainty, so that some extent misclassification is allowed in the process of rule induction. In addition, Pawlak et al. proposes 0.5-probabilistic rough set model [14]. Recent researches on probabilistic rough set model can be found in literature [1, 3, 6, 15]. Compared with related probabilistic rough set models, DTRS provides a solution on J.T. Yao et al. (Eds): RSKT 2011, LNCS 6954, pp. 466–475, 2011. c Springer-Verlag Berlin Heidelberg 2011
Attribute Reduction in Decision-Theoretic Rough Set Model
467
computing the threshold parameters for inclusion degree by introducing Bayesian decision theory. In recent years, DTRS has been widely concerned in the research area of rough set [4, 5, 7–12, 16–18, 20, 21]. One of the most important topics in rough set theory is the attribute reduction. Positive-region reduction is widely studied in many literatures since it presents a typical framework to keep the classification ability unchanged in the process of deleting attributes. In PRS model, a positive-region reduct is required to preserve the same classification ability as the whole attribute set, and it is unnecessary to consider that the positive-region may expand with the deletion of attributes. This is because the monotonicity of positive region w.r.t. attributes holds in PRS model. While in DTRS model, we may find that the monotonicity of positive region vs. attributes does not hold [20]. In this case, it is not required to restrict the positive-region remain unchanged. On the contrary, we should allow the positive-region expanded with the deletion of some redundant attributes. Similar studies on the monotonicity properties in rough set can be found in literature [1, 20]. In this paper, we will investigate the monotonicity of positive region in both PRS model and DTRS model, and present a new definition of attribute reduction in DTRS model. We discuss the evaluation of attribute set based on the newly defined attribute reduction in DTRS model and the significance of the attribute for classification is presented, and we propose a heuristic algorithm for attribute reduction based on the new heuristic function. The experimental analysis is covered to quantify the theoretic analysis and the performance of the heuristic algorithm.
2
Preliminaries
In this section, let us review some basic notions in DTRS [17–20]. A fundamental concept in DTRS is the definition of upper and lower approximation, in which thresholds are introduced to control the extent of misclassification. Suppose 0 ≤ β ≤ α ≤ 1, the (α, β) probabilistic lower and upper approximations are respectively defined by: (α,β)
apr R
(α,β)
(X) = {x ∈ U |P (X|[x]R ) ≥ α}, apr R
(X) = {x ∈ U |P (X|[x]R ) > β},
where [x]R is the equivalence class of x with regard to equivalence relation R, and P (X|[x]R ) denotes theconditional probability of X given the description R| . Based on the (α, β) probabilistic lower and [x]R , i.e., P (X|[x]R ) = |X|[x][x] R| upper approximations, the probabilistic positive, boundary and negative regions are respectively defined by: (α,β)
P OSR
(X) = apr(α,β) (X) = {x ∈ U |P (X|[x]R ) ≥ α}, R
(α,β)
(X) = aprR
(α,β)
(X) = U − aprR
BN DR N EGR
(α,β)
(X) − apr (α,β) (X) = {x ∈ U |β < P (X|[x]R ) < α}, R (α,β)
(X) = {x ∈ U |P (X|[x]R ) < β}.
For a decision table S = {U, C ∪ D}, where equivalence relation R is determined by attribute set C, the relative probabilistic positive, boundary and negative
468
H. Li et al.
regions for condition attribute set C and decision attribute set D are respectively defined by: (α,β) (α,β) (α,β) (α,β) P OSC (X), BN DC (D) = BN DC (X), P OSC (D) = X∈U/D (α,β) N EGC (D)
=
X∈U/D (α,β) N EGC (X).
X∈U/D
In DTRS model, the threshold parameters (α, β) are determined by minimizing the expected risk of misclassification. Let Ω = {XP = X, XN = ¬X} be a set of decision classes including two complementary states, and A = {aP , aN , aB } denote the three actions to classify an object respectively into P OS(α,β) (X), BN D(α,β) (X) and N EG(α,β) (X), and λP P (λP N ), λBP (λBN ), λN P (λN N ) respectively denote the costs of taking actions aP , aB and aN when the true class is X(¬X). The expected loss for taking action ai w.r.t. [x]R is computed by: L(ai |[x]R ) = λij P (Xj |[x]R ), j∈{P,N }
Based on the minimization of the overall loss and the hypothesis on the relation of λij [17–19], we get the optimal decision rules as follows: If
P (X|[x]R ) ≥ α,
decide P OS(X);
If β < P (X|[x]R ) < α, decide BN D(X); decide N EG(X), If P (X|[x]R ) ≤ β, where the threshold parameters (α, β) are respectively determined by: α=
3
λP N − λBN λBN − λN N ,β = . (λP N − λBN ) + (λBP − λP P ) (λBN − λN N ) + (λN P − λBP )
Attribute Reduction in DTRS Model
Attribute reduction is an important topic in rough set theory, by which the redundant information can be deleted and the essential knowledge hidden behind the data set can be discovered. In general, an attribute reduct is a subset of attributes that are jointly sufficient and individually necessary for preserving a particular property of a given data set. In this paper, we mainly discuss the positive-region reduction. Firstly, let us review the definition of positive-region reduction and discuss the monotonicity of positive region in PRS model. 3.1
Monotonicity of Positive Region in PRS
Definition 1. [13] Suppose S = {U, C ∪ D}, an attribute set B ⊆ C is a P OSC (D) and P OSB−{a} (D) = Pawlak positive-region reduct iff P OSB (D) = P OSB (D) for any a ∈ B, where P OSC (D) = Y ∈U/D {x|[x]C ⊆ Y }.
Attribute Reduction in Decision-Theoretic Rough Set Model
469
For Pawlak positive-region reduction, it is required that the positive region of B be exactly equal to the positive region of the full attribute set C, i.e., P OSB (D) = P OSC (D). It is unnecessary to consider that P OSB (D) may be a super set of P OSC (D), i.e., P OSB (D) ⊇ P OSC (D), or |P OSB (D)| > |P OSC (D)|. This is because the monotonicity of positive region vs. attributes holds in PRS model, which can be strictly presented by following theorem. Theorem 1. Let S = {U, C ∪ D} is a decision table, B ⊆ C is a subset of attributes, then we have: P OSB (D) ⊆ P OSC (D). Proof: For any x ∈ P OSB (D), there exist Yi ∈ U/D satisfying [x]B ⊆ Yi , and it holds that [x]C ⊆ [x]B since B ⊆ C, thus we have [x]C ⊆ Yi , and x ∈ P OSC (D) holds, therefore, P OSB (D) ⊆ P OSC (D). However in DTRS, as well as in all probabilistic rough set model, we may find that the monotonicity of positive region vs. attributes does not hold. In this case, the positive region may expand with the decreasing of the attribute, and the positive region may be shrink with the increasing of the attribute. In other words, the accuracy of classification may increase with the deletion of some attributes. In what follow, we will explain why the monotonicity of positive region vs. attributes does not hold. 3.2
Monotonicity of Positive Region in DTRS
In this subsection, we will discuss the monotonicity of positive region in DTRS and present a theoretic analysis to interpret why monotonicity of positive region does not hold in DTRS model. The monotonicity of positive region w.r.t attribute sets does not hold in DTRS model because we allow some extent misclassification for the objects in positive region, and the conditional probability for a given description classified into a certain decision class may increase with some attributes being deleted. We use Fig. 1 for a detailed illustration. Fig. 1 presents the conditional probabilities of X given [x] with regard to attribute set C and attribute set B, where B is a subset of C, i.e., B ⊆ C, and X stands for a decision class. Obviously we have [x]C ⊆ [x]B since B ⊆ C. In other words, the equivalence class [x]C expands with some attributes being deleted from C. Comparing left and right of Fig. 1, we may find that both [x] and the intersection of [x] and X have expanded, but the intersection of [x] and X expands in a larger scale when compared to the expansion of [x], thus we have P (X|[x]C ) < P (X|[x]B ). Suppose a probabilistic positive region threshold (α,β) α satisfy that P (X|[x]C ) < α < P (X|[x]B ), then we have x ∈ P OSB (D) and (α,β) (α,β) (α,β) x∈ / P OSC (D), therefore P OSB (D) P OSC (D). On the contrary, it (α,β) (α,β) may hold that P OSB (D) ⊃ P OSC (D), as presented in the counter example above, which indicates that the monotonicity of positive region vs. attributes does not hold in DTRS model. In this case, we should allow the expansion of positive region instead of remaining the positive region unchanged in the process of attribute reduction,
470
H. Li et al.
[x]B
[x]C
X
X
[x]C 䌽X
[x]B 䌽X
P( X | [ x]C ) =
[x]C 䌽X [x]B 䌽X < P( X | [ x ]B ) = [x]B [x]C
Fig. 1. Changes of conditional probability w.r.t. attribute sets
so that the reduct attribute set may have higher classification ability when compared to the full attribute set, or at least it has the same classification ability as the full attribute set. Furthermore, we may get a reduct with less attributes since we allow the expansion of positive region in the process of deleting attributes which means the restrictions for attribute reduction is set to a looser condition. 3.3
Attribute Reduction in DTRS
As mentioned in last subsection, the positive region should be allowed to expand with the decrease of attributes in the process of attribute reduction, and the definition of the independence of an attribute set should be modified accordingly, hereby we present a definition for attribute reduction in DTRS as follows. Definition 2. Suppose S = {U, C∪D} is a decision table, an attribute set B ⊆ C is a positive-region reduct in DTRS model iff the following two conditions are satisfied: (α,β)
I) |P OSB II)
(α,β)
(D)| ≥ |P OSC
(α,β) |P OSB−{a} (D)|