A systematic study on attribute reduction with ... - Semantic Scholar

Comment

Report 2 Downloads 64 Views

Available online at www.sciencedirect.com

Information Sciences 178 (2008) 2237–2261 www.elsevier.com/locate/ins

A systematic study on attribute reduction with rough sets based on general binary relations Changzhong Wang a,*, Congxin Wu a, Degang Chen b a

b

Department of Mathematics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, PR China Department of Mathematics and Physics, North China Electric Power University, Beijing 102206, PR China Received 21 March 2006; received in revised form 9 January 2008; accepted 9 January 2008

Abstract Attribute reduction is considered as an important preprocessing step for pattern recognition, machine learning, and data mining. This paper provides a systematic study on attribute reduction with rough sets based on general binary relations. We deﬁne a relation information system, a consistent relation decision system, and a relation decision system and their attribute reductions. Furthermore, we present a judgment theorem and a discernibility matrix associated with attribute reduction in each type of system; based on the discernibility matrix, we can compute all the reducts. Finally, the experimental results with UCI data sets show that the proposed reduction methods are an eﬀective technique to deal with complex data sets. 2008 Elsevier Inc. All rights reserved. Keywords: Attribute reduction; Discernibility matrix; Rough sets based on general binary relations; Relation information systems; Relation decision systems

1. Introduction The theory of rough sets, proposed by Pawlak [27], is an extension of the set theory for study of information systems characterized by insuﬃcient and imperfect data, and it has been successfully applied in such artiﬁcial intelligence ﬁelds as machine learning, pattern recognition, decision analysis, process control, knowledge discovery in databases, and expert systems. One application of rough set theory is to approximate an arbitrary subset of a universe by two deﬁnable subsets called lower and upper approximations [6,25,43,46,52,55]. Another application is to reduce the number of attributes in databases. Given a data set with discrete attribute values, we can ﬁnd a subset of attributes that are the most informative and has the same discernible capability as the original attributes. Attribute reduction has been studied from the viewpoint of independence of knowledge [28]. The notion of a reduct was proposed as a minimal subset of attributes that induce the same indiscernibility relation as the whole *

Corresponding author. E-mail addresses: [email protected] (C.Z. Wang), [email protected] (D.G. Chen).

0020-0255/$ - see front matter 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.01.007

2238

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

set of attributes. Many types of attribute reductions have been proposed in recent years [1,2,20,21,25,29, 36–39,55], much attention has been paid to attribute reduction in inconsistent decision systems. For example, possible rules and possible reducts have been proposed as a means to deal with inconsistence in inconsistent decision tables [20,29]. A possible rule covers only objects belonging to the upper approximation of the decision class determined by the rule’s consequent. Approximation rules [39] are also used as an alternative to possible rules. The notions of a-reduct and a-relative reduct for decision tables were introduced in [25]. The a-reduct allows occurrence of additional inconsistency that is controlled by means of a parameter. In [37], a new concept of attribute reduction that preserves class membership distribution was presented. It was shown in [38] that the attribute reduction preserving the membership distribution is equivalent to the attribute reduction preserving the value of a generalized inference measure function. A generalized attribute reduction was also introduced in [38] that allows the value of the generalized inference measure function after the attribute reduction to be different from the original one by user-speciﬁed threshold. The notion of dynamic reducts was described in [1]. Dynamic reducts are just a subset of all reducts derived from both the original decision table and the majority of randomly chosen decision sub-tables. In [2,21,55], b-reduct based on VPRS was introduced. This type of reducts can be used to overcome the problem of noise in data. Kryszkiewicz [20] investigated and compared ﬁve notions of attribute reduction in inconsistent systems. In fact, only two of them, possible reduct and l-decision reduct, are essential because the others are just equivalent to one of them, respectively. In addition, some other reduction methods based on Pawlak’s rough sets were also proposed in [7,44]. It should be noted that all the above mentioned reductions are performed under the framework of Pawlak’s rough set theory. In other words, these attribute reductions are all based on equivalence relations. However, partition or equivalence relation, as the indiscernibility relation in Pawlak’s rough set theory, is still restrictive for many applications. For example, incomplete information systems [18,19] and real-valued information systems [15,16] can not be handled with Pawlak’s rough sets. So several generalizations were proposed to solve these problems. One approach is to relax the partition to a cover. In [3–5,22,33–35,42,51,53,54], the concept of a cover of a universe was proposed to construct the upper and lower approximations of an arbitrary set. In [51], Zakowski mainly studied the structures of covers; while in [22] Mordeson examined the relationship between upper and lower approximations and some axioms satisﬁed by Pawlak’s rough sets. In [54], Zhu and Wang claimed that they had studied the reduction of covering generalized rough sets, but their reduction does not coincide with the original idea of reduction to reduce the useless attributes in an information system or a decision system, their reduction of covering generalized rough sets is just to reduce the ‘‘redundant” members in a cover and ﬁnd the ‘‘smallest” cover that induces the same covering lower and upper approximations. Another important approach is to relax the equivalence relations [9–20,40,43,46–48]. Kryszkiewicz deﬁned similarity relations in incomplete information systems and proposed a type of attribute reduction that only eliminates the information which is not essential from the viewpoint of classiﬁcation or decision making [18]. Chen et al. suggested that an equivalence relation should be replaced by a fuzzy similarity relation and crisp rough sets could be generalized to fuzzy rough sets. They presented new deﬁnitions of lower and upper approximations on a complete completely distributive lattice, and proposed a uniﬁed framework which can both improve crisp generalizations of upper approximation and put the crisp and fuzzy generalizations of rough sets into the same framework [9]. Hu and Yu et al. extended Shannon’s entropy to measure the information quantity in a set of fuzzy sets[13] and applied the proposed measure to calculate the uncertainty in fuzzy approximation space[14] and reduce mixed data [15,16], where numerical attributes induce fuzzy equivalence relation. Greco et al. introduced the rough approximation based on preference relations and proposed a rough set methodology to analyze multi-criteria choice and ranking decision problems [11]. Slowinski and Vanderpooten proposed the substitution of an equivalence relation with a general relation where only reﬂexivity is required [43]. In [46,47], Yao investigated rough sets based on general binary relations, but his studies are mainly concentrated on the constructive and axiomatic approaches of approximation operators. In comparison with the study of attribute reduction using Pawlak’s rough sets, not much eﬀort has so far been directed to the study on the attribute reduction based on general binary relations. Since attribute reduction is an important issue in rough set theory and has been applied to solve many problems, a thorough study on this topic is of both theoretical and practical importance. In this paper, we deﬁne a relation information system, a consistent relation decision system and a relation decision system, study attribute reduction based on these system, and ﬁnd the methods proposed to deal with the attribute reduction based on general binary relations are applicable in classical systems as well, i.e., these

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2239

methods are natural generalizations of the reduction methods in Pawlak’s rough set theory [41]. Finally, the experimental results show that the proposed reduction methods are an eﬀective technique to deal with complex data sets. The rest of the paper is organized as follows. Section 2 presents the fundamentals of Pawlak’s rough sets. Section 3 reviews some basic notions of rough sets based on general binary relations. In Section 4, we discuss attribute reduction in a relation information system. In Section 5, the concept of a relation decision system is introduced. In Section 6, we study attribute reduction in a consistent relation decision system. In Section 7, we study attribute reduction in a relation decision system. In Section 8, we show some experiments on six public data sets. Section 9 presents conclusions. 2. Basic notions related to information systems and rough sets The following basic concepts about Pawlak’s rough sets can be found in [27,41]. An information system is a pair A = (U, A), where U = {x1, . . . , xn} is a nonempty ﬁnite set of objects and A = {a1, a2, . . . , am} is a nonempty ﬁnite set of attributes. With every subset of attributes B A we associate a binary relation IND(B), called B-indiscernibility relation, and deﬁned as IND(B) = {(x, y) 2 U U:a(x) = a(y), "a 2 B}. IND(B) is Obviously an equivalence relation and IND(B) = \ a 2BIND({a}). By [x]B we denote the equivalence class of IND(B) including x. For any subset X U, B X = {x 2 U:[x]B X} and BX ¼ fx 2 U : ½xB \ X 6¼ ;g are called B-lower and B-upper approximations of X in A, respectively. By M(A) we denote a n n matrix (cij), called the discernibility matrix of A, such that cij = {a 2 A:a(xi) 6¼ a(xj)} for i, j = 1, 2, . . . , n. A discernibility function f(A) for an information system A = (U, A) is a Boolean function of m Boolean variables a1 ; . . . ; am corresponding to the attributes a1, . . . , am, respectively, and deﬁned as f ðAÞða1 ; . . . ; am Þ ¼ ^f_ðcij Þ : 1 6 j < i 6 ng where _(cij) is the disjunction of all variables a¯ such that a 2 cij. An attribute a 2 B A is superﬂuous in B if IND(B) = IND(B {a}), otherwise a is indispensable in B. The collection of all indispensable attributes in A is called the core of A. We say that B A is independent in A if every attribute in B is indispensable in B. B A is called a reduct in A if B is independent and IND(B) = IND(A). The set of all the reducts in A is denoted as RED(A). Let g(A) be the reduced disjunctive form of f(A) obtained from f(A) by applying the multiplication and absorption laws, then there exist l and Xk A for k = 1, . . . , l such that g(A) = (^X1) _ _ (^Xl) where each element in Xk appears only one time. We have RED(A) = {X1, . . . , Xl}. A decision system is a pair A* = (U, A [ {a*}), where a* is the decision attribute, A is condition attribute set. We say a 2 B A is relatively dispensable in B if POSB(a*) = POSB{a}(a*), otherwise a is said to be relatively indispensable in B, where POSB(a*) is the union of B-lower approximation of all the equivalence classes induced by a*, i.e., POS B ða Þ ¼ [X 2U =a BX . If every attribute in B is relatively indispensable in B, we say that B A is relatively independent in A*. B A is called a relative reduct in A* if B is relatively independent in A* and POS* * * B(a ) = POSA(a ). The collection of all relatively indispensable attributes in A is called the relative core of A . * * Suppose M(A ) = (cij). We denote a matrix M(A ) = (cij) in the following way: (1) cij = cij {a*}, if (a* 2 cij and xi, xj 2 POSA(a*)) or pos(xi) 6¼ pos(xj); (2) cij = ;, otherwise. Here pos:U ? {0, 1} is deﬁned as pos(x) = 1 if and only if x 2 POSA(a*). All the relative reducts can be computed in an analogous way as reducts of M(A). 3. Rough sets based on general binary relations This section mainly reviews some basic notions of rough sets based on general binary relations and some statements to be used in the following sections. In the following discussion, the universe of discourse U is always considered to be ﬁnite and nonempty.

2240

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Deﬁnition 3.1 ([45,47]). Let U be a universe of discourse, R U U a binary relation on U. The relation R is said to be serial if there exists y 2 U such that (x, y) 2 R for all x 2 U; R is said to be reﬂexive if (x, x) 2 R for all x 2 U; R is said to be symmetric if for all x, y 2 U, (x, y) 2 R implies (y, x) 2 R; R is said to be transitive if for all x, y, z 2 U, (x, y) 2 R and (y, z) 2 R imply (x, z) 2 R. Deﬁnition 3.2 [42]. Let U be a universe of discourse, R U U a binary relation on U. Rs:U ? P(U) is a setvalued function, where Rs(x) = {y 2 U:(x, y) 2 R}, x 2 U. Rs(x) is referred to as the successor neighborhood of x with respect to R. Obviously, the relation R and its corresponding successor neighborhood Rs(x) are uniquely determined by each other, namely, xRy () ðx; yÞ 2 R () y 2 Rs ðxÞ. The pair (U, R) is referred to as a generalized approximation space. For any set X U, a pair of lower and upper approximations of X are deﬁned as apr0 R X ¼ fx : Rs ðxÞ X g;

apr0 R X ¼ fx : Rs ðxÞ \ X 6¼ ;g;

ð1Þ

However, another pair of lower and upper approximations of X can be deﬁned as aprR X ¼ [fRs ðxÞ : Rs ðxÞ X g;

aprR X ¼ [fRs ðxÞ : Rs ðxÞ \ X 6¼ ;g:

ð2Þ

Formulas (1) and (2) are equivalent if and only if R is an equivalence relation. The following theorem can be easily derived from Deﬁnition 3.2. Theorem 3.1 [47]. Let R and S be binary relations on U, then we have the following properties: (1) R S () 8x 2 U ; Rs ðxÞ S s ðxÞ; (2) (R \ S)s(x) = Rs(x) \ Ss(x); (3) (R [ S)s(x) = Rs(x) [ Ss(x). If R is an equivalence relation on U, then (U, R) is Pawlak approximation space, and Rs(x) is the equivalence class containing x. Suppose R = {R1, R2, . . . , Rn} is a family of binary relations on U. In the subsequent discussion, we denote Int R ¼ \ni¼1 Ri . Corollary 3.1 ðInt RÞs ðxÞ ¼ \ni¼1 ðRi Þs ðxÞ. Remark 1. In Phd Thesis [24], ﬁve types of combination are used for deﬁning ﬁve classes of tolerance relations. In this paper, we introduce the intersection for combining neighborhoods. 4. Attribute reduction in relation information systems An attribute in a data set may induce a general binary relation on the universe other than an equivalence relation due to missing data. In the following discussion, we denote an attribute by a binary relation. As we know, an equivalence relation is a key and primitive notion in Pawlak’s rough set theory. Let R be a family of equivalence relations. For any P R, IND(P) is also an equivalence relation. Suppose (U, R) is an information system. P R is a reduct of R if and only if P is a minimal subset of R satisfying IND(P) = IND(R). Skowron [41] has presented how to obtain a reduct of such an information system. If all equivalence relations are replaced by arbitrary binary relations, how can we get a minimal subset P R which satisﬁes Int R = Int P? On the other hand, if we can deal with attribute reduction with rough sets based on general binary relation, the following problems which Pawlak’s rough sets have diﬃculty to deal with can be easily solved. (1) Attribute reduction of single-structural data set. All relations induced by attributes have the same properties in this kind of data set. These relations have one or two kinds of properties (reﬂexivity, symmetry and transitivity). For example, similarity relations [15,18,43], preference relations [11] and tolerance relations [24,33,34,42] belong to this case.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2241

(2) Attribute reduction of hybrid-structural data set. There coexist some heterogeneous attribute variables in this kind of data set, such as symbolic and realvalued [12]. Symbolic variables induce equivalence relations [27–32] while real-valued variables induce similarity, preference, or tolerance relations [11,12,15,18,33,34,42,43]. Example 4.1 Table 1 shows data set play tennis with mixed categorical and numerical attributes, where U = {x1, x2, . . . , x14}, A = {outlook, T, humidity, windy}, D = {play}, and condition attributes outlook and windy are categorical; T and humidity are numerical and play is the decision. According to the classical rough sets, attributes T and humidity should be discretized before rough set analysis is performed [26,40,49]. Here we directly analyze it with rough sets based on general binary relations. Attributes outlook and windy are symbolic variables while attributes T and humidity are real-valued variables. Outlook and windy induce equivalence relations; T and humidity induce similarity or preference relations according to diﬀerent requirements. It can be seen from observations that a thorough study on attribute reduction based on general binary relations is of both theoretical and practical importance. In this section we ﬁrst introduce the concept of a relation information system. Then, we propose some theorems to characterize the attribute reduction in a relation information system. Finally, we present an approach to compute all the reducts of the system based on discernibility matrix. Deﬁnition 4.1 Let U be a universe and R = {R1, R2, . . . , Rn} a family of general binary relations on U. Then (U, R) is called a relation information system; R is called a conditional relation (attribute) set. For arbitrary x 2 U, if R = {R1, R2, . . . , Rn} is a family of reﬂexive binary relations on U, then (Int R)s(x) is also reﬂexive, and the family {(Int R)s(x):x 2 U} forms a cover of the universe, i.e., [{(Int R)s(x):x 2 U} = U. If R is a family of equivalence relations, then (Int R)s(x) is the equivalence class containing x, and the family {(Int R)s(x):x 2 U} forms a partition of the universe. Therefore, relation information systems are the extensions of information systems in Pawlak’s rough set theory. For such a generalized information system, we always suppose [{(Int R)s(x):x 2 U} 6¼ ;, i.e., there exists at least one object x 2 U such that (Int R)s(x) 6¼ ;. Deﬁnition 4.2 Let (U, R) be a relation information system and Ri 2 R. Ri is called superﬂuous in R if Int R = Int(R Ri); otherwise, Ri is called indispensable in R. For any subset P R, P is called a reduct of R if each element in P is indispensable in P and Int R = Int P. The collection of all indispensable elements in R is called the core of R, denoted as Core(R).

Table 1 Play tennis data with heterogeneous attributes Day

Outlook

T

Humidity

Windy

Play

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14

Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy

85 80 83 70 68 65 64 72 69 75 75 72 81 71

85 90 86 96 80 70 65 95 70 80 70 90 75 91

False True False False False True True False False False True True False True

No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

2242

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Deﬁnitions 4.1 and 4.2 are the natural extensions of corresponding concepts in Pawlak’s rough set theory by substituting equivalence relations with general binary relations. It can be easily seen from the two deﬁnitions that the purpose of reduction of R is to ﬁnd a minimal subset of R that keeps the relation Int R invariant. In the following, we study the attribute reduction in a relation information system. Proposition 4.1. Int R ¼ Int P () ðInt RÞs ðxÞ ¼ ðInt PÞs ðxÞ; 8x 2 U . Proposition 4.1 presents an equivalence condition to judge whether two relations are equal. Theorem 4.1 Let R = {R1, R2, . . . , Rn} be a family of binary relations on U, Ri 2 R, x 2 U. Then (Int R)s(x) 6¼ (Int{R {Ri}})s(x) if and only if there exists y 2 U such that y 62 (Int R)s(x) ) y 2 (Int{R {Ri}})s(x). Proof. Straightforward.

h

Theorem 4.1 implies that an indispensable relation can be characterized by the relationship between two objects in the universe according to Proposition 4.1 and Deﬁnition 4.2. The following Theorems 4.2 and 4.3 show that the original relationship between any two objects is invariant if superﬂuous attributes (relations) are removed. Theorem 4.2. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. For "x ,y 2 U, y 62 (Int R)s(x) if and only if there is at least a binary relation Ri 2 R such that y 62 (Ri)s(x). Proof. Straightforward.

h

Theorem 4.3 (Judgment theorem of attribute reduction). Let R = {R1, R2, . . . , Rn} be a family of binary relations on U, P R. Then Int R = Int P if and only if for "x, y 2 U, if y 62 (Int R)s(x), then y 62 (Int P)s(x). Proof. Straightforward.

h

It is easily seen from Theorems 4.2 and 4.3 that attribute reduction of a relation information system is substantially equivalent to ﬁnding out the minimal subsets of conditional relation set R to keep invariant the successor neighborhood of an arbitrary object in U with respect to Int R, i.e., to keep invariant the original relationship between two arbitrary objects. Next, we deﬁne the discernibility matrix of a relation information system by Theorems 4.2 and 4.3 as follows. Deﬁnition 4.3. Let (U, R) be a relation information system. Suppose U = {x1, x2 . . . , xn}, we denote a n n matrix (cij) by M(U, R), which is called the discernibility matrix of (U, R), and deﬁned as cij = {R 2 R:xj 62 Rs(xi)} for xi, xj 2 U. Diﬀerent from discernibility matrix based on equivalence relations, in our method one has to compute the successor neighborhood Rs(x) of every object xi 2 U and should examine xj 2 Rs(xi) for any R 2 R for construction of cij. Hu pointed out that successive neighborhoods can be obtained with a linear time o(n) [15], while the complexity of discernibility matrix is o(n2). Therefore, the time complexity of the proposed method is o(N n2), where n and N are the numbers of samples and attributes, respectively. The following theorem is used to study the properties of the discernibility matrix. Theorem 4.4. Let M(U, R) = (cij) be the discernibility matrix of (U, R) and R = {R1, R2, . . . , Rl}. Then the following statements hold: (1) (2) (3) (4)

R 2 cij () xj 62 Rs ðxi Þ; cii ¼ ;; i 6 1; 2; . . . ; n () for 8R 2 R; R is reflexive; For all k 6 l, Rk is symmetric if and only if cij = cji (i 6¼ j, i, j 6 1, 2, . . . , n); For all k 6 l, Rk is transitive if and only if cij cit [ ctj (i 6¼ j, i, j, t 6 1, 2, . . . , n).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2243

Proof (1) Straightforward. (2) For "R 2 R, since R is reﬂexive, it follows that (xi, xi) 2 R for any xi 2 U, which implies xi 2 Rs(xi). Thus R 62 cii. So cii = ;. ) If cii = ;, i 6 1, 2, . . . , n, then xi 2 Rs(xi) for any R 2 R. Hence R is reﬂexive. (3) ) Suppose Rk 2 cij, (cij 6¼ ;), then xj 62 (Rk)s(xi) ) (xi, xj) 62 Rk ) (xj, xi) 62 Rk ) xi 62 (Rk)s(xj) ) Rk 2 cji ) cij cji. Similarly cij cji. So cij = cji. We denote R1 = {R:R 2 R,"xi,xj 2 U, (xi, xj) 2 R},R2 = R R1. For each R 2 R1, R is reﬂexive, symmetric and transitive. For "R 2 R2, there must be xi0 ; xj0 2 U ði0 6¼ j0 Þ such that ðxi0 ; xj0 Þ 62 R. Hence xj0 62 Rs ðxi0 Þ, which implies R 2 ci0 j0 . Since ci0 j0 ¼ cj0 i0 , we have R 2 cj0 i0 . Thus xi0 62 Rs ðxj0 Þ, which implies ðxj0 ; xi0 Þ 62 R. So R is symmetric. (4) ) Suppose that R0k ðk 6 1; 2; . . . ; lÞ is transitive, i.e., ðxi ; xt Þ 2 R0k and ðxt ; xj Þ 2 R0k ) ðxi ; xj Þ 2 R0k . This is equivalent to xt 2 ðR0k Þs ðxi Þ and xj 2 ðR0k Þðxt Þ ) xj 2 ðR0k Þs ðxi Þ, which implies that R0k 62 cit and R0k 62 ctj ) R0k 62 cij . So if Rk 2 cij, then Rk 2 cit or Rk 2 ctj, namely cij cit [ ctj. By (3) above, for each R 2 R1, R is symmetric and transitive. Suppose R 2 R2, there must exist xi0 ; xj0 2 U ði0 6¼ j0 Þ such that ðxi0 ; xj0 Þ 62 R, which implies xj0 62 Rs ðxi0 Þ. Hence R 2 ci0 j0 . Since ci0 j0 ci0 t [ ctj0 , R 2 ci0 t or R 2 ctj0 holds. Thus xj0 62 Rs ðxi0 Þ ) xt 62 Rs ðxi0 Þ or xj0 62 Rs ðxt Þ. Namely ðxi0 ; xj0 Þ 62 R ) ðxi0 ; xt Þ 62 R, or ðxt ; xj0 Þ 62 R. So if ðx0i0 ; x0t Þ 2 R and ðx0t ; x0j0 Þ 2 R, then ðx0i0 ; x0j0 Þ 2 R. Hence R is transitive. h Theorem 4.4 illustrates that the special properties of discernibility matrix in a relation information system are determined by special binary relations. If R is a family of equivalence relations, M(U, R) is the discernibility matrix of the corresponding information system in Pawlak’s rough set theory [41]. In this sense, the reduction method proposed for a relation information system is a generalization of the corresponding reduction method in Pawlak’s rough set theory [41]. The following theorem is employed to characterize the core of a relation information system. Theorem 4.5. Core(R) = {R 2 R:cij = {R}, i, j 6 n}. Proof. Suppose R 2 Core(R), then Int(R {R}) 6¼ Int R. By Theorem 4.1, it follows that there exist xi, xj 2 U such that xj 62 (Int R)s(xi), but xj 2 (Int{R {R}})s(xi). Obviously, there is only R 2 R satisfying xj 62 Rs(xi). By Deﬁnition 4.3, cij = {R}. Hence Core(R) {R 2 R:cij = {R}, i, j 6 n}. Conversely, if cij = {R} for xi, xj 2 U, then R 2 Core(R) by Theorem 4.1. Hence Core(R) {R 2 R:cij = {R}, i, j 6 n}. Therefore Core(R) = {R 2 R:cij = {R}, i, j 6 n}. h Theorem 4.6. Let P R, then Int(P) = Int(R) if and only if P \ cij 6¼ ; for any cij 6¼ ;.

Proof. ) Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, for "R 2 P we have xj0 2 Rs ðxi0 Þ, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since Int(P) = Int(R), by Theorem 4.3 we have xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R, which implies ci0 j0 ¼ ;. This is a contradiction to the assumption. So P \ cij 6¼ ; for any cij 6¼ ;. If P \ cij 6¼ ; for xi, xj 2 U satisfying cij 6¼ ;, we suppose R 2 P \ cij, then we have xj 62 Rs(xi) ) xj 62 (Int R)s(xi) ) xj 62 (Int P)s(xi). By Theorem 4.3, Int(P) = Int(R) holds. h By Deﬁnition 4.2 and Theorem 4.6 we immediately get the following corollary. Corollary 4.1 Let P R, then P is a reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ; (cij 6¼ ;, i, j 6 n).

2244

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Deﬁnition 4.4. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R) is a function deﬁned on (U, R), the corresponding Boolean variable Ri ði 6 nÞ is deﬁned for each binary relation Ri 2 R, and f(U, R) is deﬁned as f ðU ; RÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg; ði; j 6 n; cij 6¼ ;Þ. Then f(U, R) is a Boolean function of (U, R), called a discernibility function or discernibility formula of (U, R), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a relation information system. Theorem 4.7. Let (U, R) be a relation information system. M(U, R) = (cij:i, j 6 n) is the discernibility matrix of (U, R). The discernibility formula f(U, R) is defined as: f ðU ; RÞ ¼

n ^

ð_cij Þ; ðcij 6¼ ;Þ:

i;j¼1

. If f ðU ; RÞ ¼ _lk¼1 ð^Bk ÞðBk RÞ is obtained from f(U, R) by applying the multiplication and absorption laws, which satisfies that every element in Bi only appears one time, then the set {Bk:k 6 l} is the collection of all reducts of R, i.e., Red(R) = {Bk:k 6 l}. Proof. For each k 6 1, . . . , l, we have ^Bk 6 _ cij. By the disjunction and conjunction laws, Bk \ cij 6¼ ; for any cij 6¼ ;. Since f ðU ; RÞ ¼ _lk¼1 ð^Bk Þ, it follows that for arbitrary Bk if we reduce an element R from Bk, let 0 0 l k1 l B0k ¼ Bk fRg, then f ðU ;RÞ 6¼ _k1 r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ and f ðU ;RÞ < _r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ. 0 0 If we still have Bk \ cij 6¼ ; for each cij 6¼ ;(i, j 6 n), then ^Bk 6 _cij . This implies that f ðU ; RÞ P 0 0 l _k1 r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ, which is a contradiction. Hence, there exists ci0 j0 6¼ ; such that Bk \ ci0 j0 ¼ ;, which implies that Bk is a reduct of (U, R). For any X 2 Red(R), we have X \ cij 6¼ ; for any cij 6¼ ;(i, j 6 n). Thus f(U, R) ^ (^X) = ^ (_cij) ^ (^X) = ^ X. If Bk X 6¼ ; for each k, we can ﬁnd Rk 2 Bk X. By rewriting f ðU ; RÞ ¼ ð_lk¼1 Rk Þ ^ U, we have ^X 6 _lk¼1 Rk . So there must be Rk0 such that ^X 6 Rk0 , this implies Rk0 2 X. This is a contradiction. So Bk0 X for some k0. Since both X and Bk0 are reducts, we have X ¼ Bk0 . Hence RED(R) = {B1,. . ., Bl}. h The following example is employed to illustrate our idea in this section. Example 4.2. Let (U, R) be a relation information system, where U = {x1, x2, . . . , x9}, R = {R1, R2, R3, R4}, R1 ¼ fðx1 ; x4 Þ; ðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x4 Þ; ðx4 ; x1 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 x6 Þ; ðx5 ; x2 Þ; ðx5 ; x6 Þ; ðx5 ; x5 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R2 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x7 Þ; ðx2 ; x4 Þ; ðx3 ; x2 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx5 ; x6 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R3 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x7 Þ; ðx2 ; x8 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx5 ; x2 Þ; ðx5 ; x3 Þ; ðx5 ; x5 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx7 ; x6 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R4 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x4 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x5 Þ; ðx5 ; x6 Þ; ðx5 ; x8 Þðx6 ; x3 Þ; ðx6 ; x6 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; Int R ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx5 ; x2 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; ðInt RÞs ðx1 Þ ¼ ðInt RÞs ðx3 Þ ¼ ðInt RÞs ðx7 Þ ¼ ðInt RÞs ðx9 Þ ¼ /; ðInt RÞs ðx2 Þ ¼ fx1 ; x3 g; ðInt RÞs ðx4 Þ ¼ fx2 ; x5 g; ðInt RÞs ðx5 Þ ¼ fx2 g; ðInt RÞs ðx6 Þ ¼ fx3 ; x7 g; ðInt RÞs ðx8 Þ ¼ fx4 ; x8 ; x9 g:

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2245

The discernibility matrix of (U, R) is 2

ðxj Þ

R

6 ; 6 6 6 R 6 6 6 R 2 ; R3 ; R4 6 6 ðxi Þ6 R 6 6 6 R 6 6 6 R 6 6 R 4 R

3

R

R

R 2 ; R3 ; R4

R

R

R

R

R

;

R3

R

R

R1 ; R4

R1 ; R2 ; R 4

R1 ; R3 ; R4

R

R

R

R

R

R

;

R

R

;

R3

R

R

;

R1 ; R2 ; R 4

R 1 ; R3 ; R4

R2

R3

R

R1 ; R2 ; R 3

R

;

R

R

R1 ; R2 ; R3

;

R

R

R

R

R

R1 ; R2 ; R4

R

R

R

R

;

R

R

R

;

7 R7 7 R7 7 7 R7 7 7 R7 7 7 R7 7 7 R7 7 7 ;5

R

R

R

R

R

R

R

R

R

and f ðU ; RÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 9; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ^ R3 ^ ðR1 _ R4 Þ ^ ðR1 _ R2 _ R4 Þ ^ ðR1 _ R3 _ R4 Þ ^ R2 ^ ðR1 _ R2 _ R3 Þ ¼ R3 ^ ðR1 _ R4 Þ ^ R2 ¼ ðR1 ^ R2 ^ R3 Þ _ ðR2 ^ R3 ^ R4 Þ so Red(R) = {{R1, R2, R3},{R2, R3, R4}}, Core(R) = {R2, R3}. 5. A relation decision system and its properties Although diﬀerent relation-based rough set models may have diﬀerent deﬁnitions of approximations, the forms of deﬁnition of lower approximation are the same as formula (1) in Section 3. For example, similarity relation-based rough sets [18,43], preference relation-based rough sets [11] and tolerance relation-based rough sets [33,34,42] can be considered as belonging to this case. As to upper approximation, we omit discussion on it because it is not irrelevant to our work. As mentioned in Section 3, for a given generalized approximation space, there may be two forms of deﬁnition of lower approximation. It raises a problem: is there a solid relationship between the two deﬁnition forms of lower approximation? If there is, how do we construct the relationship? To obtain the relationship between the two forms of deﬁnition of lower approximation, the concept of a relation decision system is ﬁrst introduced by generalizing a decision system in Pawlak’ rough set theory. Some related classical concepts, such as lower approximation operators, positive and negative domains in the generalized approximation space, are redeﬁned. By these revisions of the classical concepts, the relationship between the two forms of deﬁnition of lower approximation and some useful results for a relation decision system are then derived. Let us ﬁrst introduce the concept of a relation decision system. Deﬁnition 5.1 Let U be a universe of discourse, R = {R1, R2, . . . , Rn} a family of general binary relations on U, D a decision equivalence relation on U, and U/D the decision partition relative to R on U, then (U, R, D) is called a relation decision system, R is called a condition relation (attribute) set. Let d(x) be a decision function from U to the value set Vd, i.e., d:U ? Vd, deﬁned as d(x) = d([x]D). From Deﬁnition 5.1, a relation decision system is such a system in which the family {(Int R)s(x):x 2 U} may not form a partition or a cover of U; while D generates a partition U/D on U. Of course, if R is a family of reﬂexive and transitive binary relations on U, the family {(Int R)s(x):x 2 U} forms a cover of U, and the relation decision system becomes a type of covering decision system [8]. If R is a family of equivalence relations on U, then R induces a partition of U, and the relation decision system becomes a decision system in Pawlak’s rough set theory [41]. Therefore, relation decision systems are naturally the extensions of decision systems in Pawlak’s rough set theory.

2246

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Deﬁnition 5.2. Let (U, R, D) be a relation decision system. For any subset X U, we redeﬁne the lower approximation of X in two ways: R0 ðX Þ ¼ fx 2 U : ðIntRÞs ðxÞ 6¼ ;; ðInt RÞs ðxÞ X g; RðX Þ ¼ [fðIntRÞs ðxÞ : ðIntRÞs ðxÞ 6¼ ;; ðInt RÞs ðxÞ X g: Correspondingly, the positive domains of D relative to R are deﬁned as follows, respectively: POS 0R ðDÞ ¼ [X 2U =D R0 ðX Þ, POSR(D) = [ X2U/D R(X). Let P R, we redeﬁne the lower approximations of X with respect to P as follows: P0 (X) = {x 2 R0 (X):(Int P)s(x) 6¼ ;, (Int P)s(x) X}, P(X) = [ {(Int P)s(x):(Int P)s(x) 6¼ ;, (Int P)s(x) X}. Correspondingly, the positive domains of D relative to P are deﬁned as follows, respectively: POS 0P ðDÞ ¼ [X 2U =D P0 ðRðX ÞÞ, POSP(D) = [ X 2 U/DP(R(X)). In order to identify these concepts, we say that R(X), P(X) and POSP(D) are the support domains (or the successor neighborhoods) of R0 (X),P0 (X) and POS 0P ðDÞ, respectively. We deﬁne another two notions, called the null and negative domains of D relative to R respectively, and denoted as NulR(D) and NegR(D) respectively: NulR ðDÞ ¼ fx 2 U : ðInt RÞs ðxÞ ¼ ;g;

NegR ðDÞ ¼ U POS 0R ðDÞ NulR ðDÞ:

By Deﬁnition 5.2, the null, negative, and positive domains of D relative to R divide the universe U into three pairwise-disjoint parts. Similarly, these domains of D relative to P divide the universe as well. If R is a family of equivalence relations on U, then the concepts of lower approximation and positive domain in Deﬁnition 5.2 become the corresponding concepts in Pawlak’s approximation space respectively, and the two forms of definition about each concept in Deﬁnition 5.2 are equivalent. Remark 1. For any subset X U, according to the classical deﬁnitions of the lower approximations of X with respect to R and P in Pawlak’s rough set theory [27], Proposition aprP(X) aprR(X) is always true. In order to keep this relationship invariant for rough sets based on general binary relations, we redeﬁne P0 (X) as mentioned above. For a similar reason, we redeﬁne POS 0P ðDÞ and POSP(D) as mentioned above. Unless stated, if the concepts, such as lower approximations, positive and negative domains, appear in the following sections, they mean what they are deﬁned in this section. The following Propositions 5.1 and 5.2 can be derived from Deﬁnition 5.2. Proposition 5.1. Suppose that (U, R, D) is a relation decision system, P R. Then it has the following properties: (1) (3) (5) (7)

R(X) X, for "X U; R(R(X)) = R(X), for " X U; POSR(D) = [ X 2 U/DR(R(X)); POS 0R ðDÞ POS 0P ðDÞ;

Proof. Straightforward.

(2) (4) (6) (8)

R0 (R(X)) = R0 (X), for "X U; POS 0R ðDÞ ¼ [X 2U =D R0 ðRðX ÞÞ; P0 (X) R0 (X), for "X U; POSR(D) POSP(D).

h

In general, the implications of positive domain and support domain are diﬀerent as deﬁned above. However, if R is a family of equivalence relations, then R0 (X) = R(X) for any subset X U, which implies that (2) and (3), (4) and (5), (7) and (8) are equivalent. Proposition 5.2. Suppose that (U, R, D) is a relation decision system, P R. Then the following statements hold: (1) (2) (3) (4)

POS 0R ðDÞ ¼ POS 0P ðDÞ () R0 ðRðX ÞÞ ¼ P0 ðRðX ÞÞ; 8X 2 U =D; POS R ðDÞ ¼ POS P ðDÞ () RðRðX ÞÞ ¼ PðRðX ÞÞ; 8X 2 U =D; R0 (R(X)) = P0 (R(X)) ) R(R(X)) = P(R(X)), "X U; POS 0R ðDÞ ¼ POS 0P ðDÞ ) POS R ðDÞ ¼ POS P ðDÞ.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof. Straightforward.

2247

h

Remark 2. By (3) and (4) of Proposition 5.2, we construct the relationship between the two forms of deﬁnition of lower approximation and the relationship between the positive domain and its support domain, respectively. Therefore, we deﬁne relative reduction as Deﬁnition 7.1 in Section 7, that is, our relative reduction not only keeps POS 0R ðDÞ invariant, but also keeps its support domain POSR(D) invariant. Of course, we may only consider to keep POS 0R ðDÞ invariant, no matter how its support domain POSR(D) changes. In this case, we only replace R(X) with X in the deﬁnition of POS 0P ðDÞ. But this reduction method is relatively simple; we only consider the previous case in this paper. 6. Attribute reduction in consistent relation decision systems In this section, we study attribute reduction in a consistent relation decision system. With the following discussion, we can ﬁnd that consistent decision systems in Pawlak’s rough set theory are the special cases of consistent relation decision systems, and the proposed reduction method for consistent relation decision systems is a generalization of the corresponding one in Pawlak’s rough set theory [41]. Let us start with introducing the concept of a consistent relation decision system. Deﬁnition 6.1 Let U be a universe, R = {R1, R2, . . . , Rn} a family of general binary relations on U, D a decision equivalence relation, U/D the decision partition on U, and (U, R, D) a relation decision system. If for "x 2 U, $Bj 2 U/D such that (Int R)s(x) Bj, then (U, R, D) is called a consistent relation decision system, denoted as U/Int R 6 U/D. Otherwise, (U, R, D) is called an inconsistent relation decision system. Let U R ¼ [fðInt RÞs ðxÞ : x 2 U g, U R is called the decision object set of(U, R, D). From Deﬁnition 6.1, a relation decision system (U, R, D) is consistent if and only if NegR(D) = ;, and (U, R, D) is inconsistent if and only if NegR(D) 6¼ ;. For a given consistent relation decision system, if(Int R)s(x) = ;, then (Int R)s(x) Bj is always true for "Bj 2 U/D. If each Ri 2 R is reﬂexive, then the family {(Int R)s(x):x 2 U} forms a cover of U, i.e., U R ¼ U . If each Ri 2 R is equivalent, then the family {(Int R)s(x):x 2 U} is a partition of U. Obviously, consistent decision systems in Pawlak’s rough set theory are the special cases of consistent relation decision systems. Let d(x) be a decision function from U to the value set Vd, i.e., d:U ? Vd. For "x 2 U, we have d(x) = d([x]D). For a consistent relation decision system, if (Int R)s(x) 6¼ ;, then $z 2 U such that d((Int R)s(x)) = d([z]D) = j 2 Vd; if(Int R)s(x) = ;, we make the convention: let d((Int R)s(x)) = 0 62 Vd, i.e., for " y 2 U,d((Int R)s(x)) 6¼ d(y) 2 Vd. Therefore, for " x, y 2 U, if d((Int R)s(x)) 6¼ d((Int R)s(y)), then (Int R)s(x) \ (Int R)s(y) = ;; but if(Int R)s(x) \ (Int R)s(y) = ;, then d((Int R)s(x)) = d((Int R)s(y)), or d((Int R)s(x)) 6¼ d((Int R)s(y)). If d((Int R)s(x)) = d((Int R)s(y)), then (Int R)s(x) \ (Int R)s(y) = ;, or (Int R)s(x) \ (Int R)s(y) 6¼ ;, or (Int R)s(x) (Int R)s(y), or (Int R)s(x) (Int R)s(y). From Deﬁnition 6.1, the following proposition can easily be obtained. Proposition 6.1 Let U/Int R 6 U/D, y 2 U. Then y 62 U R () y 62 ðInt RÞs ðxÞ for 8x 2 U . Deﬁnition 6.2 Let (U, R, D) be a consistent relation decision system, Ri 2 R. Ri is called superﬂuous relative to D in R if Ri satisﬁes the following conditions: (1) U R ¼ U fRfRi gg ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int{R {Ri}})s(x) = ;; if (Int R)s(x) 6¼ ;, then $z 2 U such that (Int{R {Ri}})s(x) [z]D. Otherwise, Ri is called indispensable relative to D in R. For any P R, P is called a reduct of R relative to D if each element in P is indispensable in P, and for "x 2 U, (Int R)s(x) = ; implies (Int P)s(x) = ;, and U R ¼ U P . The collection of all indispensable elements relative to D in R is called the core of R relative to D, denoted as CoreD(R). From Deﬁnition 6.2, a relative reduct of a consistent relation decision system keeps both decision object set U R and decision rules of each object in U invariant.

2248

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The following theorem shows us that Deﬁnition 6.2 can be equivalently characterized by another two conditions. Theorem 6.1 The two conditions in Definition 6.2 are equivalent to the following two conditions: (10 ) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ; (20 ) For "x 2 U, if (Int R)s(x) = ;, then (Int{R {Ri}})s(x) = ;, i.e., NulR ðDÞ ¼ NulfRRi g ðDÞ. Namely, if condition (1) in Deﬁnition 6.2 is substituted with (10 ), and condition (2) is substituted with (20 ), then the derived result is equivalent to Deﬁnition 6.2. Proof. Denote P = R Ri. Since the system is consistent, by the deﬁnition of U R it follows that for 8ðInt RÞs ðxÞ U R satisfying (Int R)s(x) 6¼ ;, there must exist X 2 U/D such that (Int R)s(x) X. By the deﬁnition of R(X), (Int R)s(x) R(X) holds. This implies U R [X 2U =D RðX Þ. By the deﬁnition of R(X) and U R , we know RðX Þ U R , which implies U R [X 2U =D RðX Þ. Hence U R ¼ [X 2U =D RðX Þ. Similarly, according to condition (2) of Deﬁnition 6.2, we also have U P ¼ [X 2U=D PðX Þ. Since U R ¼ U P , we have [X 2 U/DR(X) = [ X 2 U/ DP(X). Similar to the proof of (2) of Proposition 5.2, we have R(X) = P(X) for " X 2 U/D. According to condition (2) of Deﬁnition 6.2, ; 6¼ (Int R)s(x) (Int P)s(x) X for " x 2 R0 (R(X)). By the deﬁnition of P(X), (Int P)s(x) P(X). Thus ; 6¼ (Int R)s(x) (Int P)s(x) P(X) = R(X) X, which implies x 2 P0 (R(X)). Hence R0 (R(X)) P0 (R(X)). By (6) of Proposition 5.1, R0 (R(X)) P0 (R(X)). Thus R0 (R(X)) = P0 (R(X)). Therefore POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ, i.e., U R ¼ U fRfRi gg ) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ. Conversely, since POS 0R ðDÞ ¼ POS 0P ðDÞ, we have R0 (R(X)) = P0 (R(X)) for "X 2 U/D. For "x 2 R0 (R(X)), by the deﬁnition of P0 (R(X)), it follows that ; 6¼ (Int R)s(x) (Int P)s(x) R(X) X. By R0 (R(X)) = R0 (X), [x2R0 ðX Þ ðInt RÞs ðxÞ [x2R0 ðX Þ ðInt PÞs ðxÞ RðX Þ holds. Since RðX Þ ¼ [x2R0 ðX Þ ðInt RÞs ðxÞ, we have RðX Þ ¼ [x2R0 ðX Þ ðInt PÞs ðxÞ. Since the system is consistent and (Int R)s ðxÞ ¼ ; () ðInt PÞs ðxÞ ¼ ;, U R ¼ [X 2U =D RðX Þ ¼ [X 2U =D [x2R0 ðX Þ ðInt PÞs ðxÞ ¼ [fðInt PÞs ðxÞ : ðInt PÞs ðxÞ 6¼ ;; x 2 U g ¼ U P . Therefore, we have U R ¼ U fRfRi gg () POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ. h By Theorem 6.1, if a relation decision system is consistent and P R is a reduct of R relative to D, then U R ¼ U P () POS 0R ðDÞ ¼ POS 0P ðDÞ. By (4) of Proposition 5.2, U R ¼ U P ) POS R ðDÞ ¼ POS P ðDÞ holds, but the reverse is not necessarily true. Deﬁnition 6.3 Let (U, R, D) be a consistent relation decision system, P R. P is called an equivalence subset of R relative to D if P satisﬁes the following conditions: (1) U R ¼ U P ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;; if (Int R)s(x) 6¼ ;, then $z 2 U such that (Int P)s(x) [z]D. Theorem 6.2 Let U/Int R 6 U/D, Ri 2 R. Then Ri is indispensable if and only if there is at least a pair of x, y 2 U such that if x and y satisfy d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 2 (Int{R {Ri}})s(x); if x and y satisfy d((Int R)s(x)) = d(y), then y 62 U R ) y 2 U fRfRi gg . Proof. ) Since Ri is indispensable, by Deﬁnition 6.2 we know that condition (1) or (2) in Deﬁnition 6.2 is not true. Suppose that the condition (2) is not true, then no matter whether condition (1) is true, there exists x 2 U such that if (Int R)s(x) = ;, then (Int R {Ri})s(x) 6¼ ;, thus $y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R {Ri}})s(x); At the same time d((Int R)s(x)) 6¼ d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R {Ri}})s(x). If(Int R)s(x) 6¼ ;, then $z 2 U such that (Int R)s(x) [z]D, but (Int{R {Ri}})s(x) [z]D is not true. This implies that $y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R {Ri}})s(x) and y 62 [z]D. Hence d((Int R)s(x)) 6¼ d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R {Ri}})s(x).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2249

Suppose that condition (2) is true and condition (1) is not true, then U R 6¼ U fRfRi gg , which implies that $y 2 U such that y 62 U R , but y 2 U fRfRi gg . Thus $x 2 U such that y 2 (Int{R {Ri}})s(x). Since condition (2) is true, we have (Int R)s(x) 6¼ ; (if (Int R)s(x) = ;, then by Deﬁnition 6.2, (Int R {Ri})s(x) = ;, which is a contradiction to y 2 (Int{R {Ri}})s(x)) and $z 2 U such that (Int{R {Ri}})s(x) [z]D. Thus y 2 [z]D. This implies d((Int R)s(x)) = d((Int{R {Ri}})s(x)) = d([z]D) = d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) = d(y) such that y 62 U R ) y 2 U fRfRi gg . Suppose that there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R {Ri}})s(x), then (Int R)s(x) 6¼ (Int{R {Ri}})s(x). If(Int R)s(x) = ;, by Deﬁnition 6.2 we know that Ri is indispensable in R. If (Int R)s(x) 6¼ ;, since U/Int R 6 U/D, it follows that there exists z 2 U such that (Int R)s(x) [z]D. Assume that (Int{R {Ri}})s(x) [z]D, then y 2 [z]D. This implies d((Int R)s(x)) = d([z]D) = d(y), which is a contradiction. Thus (Int{R {Ri}})s(x) [z]D is not true. Hence Ri is indispensable in R. Suppose that there exists a pair of x, y 2 U satisfying d((Int R)s(x)) = d(y) such that y 62 U R ) y 2 U fRfRi gg . It is clear that U R 6¼ U fRfRi gg , hence Ri is indispensable in R. h Theorem 6.3 (Judgment theorem of attribute reduction). Let U/Int R 6 U/D, P R. Then P is an equivalence subset of R relative to D if and only if for "x, y 2 U, if d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 62 (Int P)s(x); if d((Int R)s(x)) = d(y), then y 62 U R ) y 62 U P . Proof. ) Since P is an equivalence subset of R relative to D, it follows that for "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;. This implies that for "y 2 U we have 0 = d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x) ) y 62 (Int P)s(x). If (Int R)s(x) 6¼ ;, then $z 2 U such that (Int P)s(x) [z]D. Since P R, we have (Int R)s(x) (Int P)s(x) [z]D. For "y 2 U satisfying y 62 [z]D, we have d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x) ) y 62 (Int P)s(x). For "y 2 U satisfying y 2 [z]D, we have d((Int R)s(x)) = d(y). Since U R ¼ U P , it follows that if y 62 U R , then y 62 U P , i.e., y 62 U R ) y 62 U P . Since U/Int R 6 U/D, it follows that for any x 2 U if (Int R)s(x) = ;, then for "y 2 U we have d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x) = ;. If (Int R)s(x) 6¼ ;, then $z 2 U such that (Int R)s(x) [z]D. For "y 2 U satisfying y 62 [z]D, namely d((Int R)s(x)) 6¼ d(y), we have y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x) [z]D. For "y 2 U satisfying y 2 [z]D, namely d((Int R)s(x)) = d(y), since y 62 U R ) y 62 U P , we have U R U P . Since U R U P is obviously true, we have U R ¼ U P . Altogether, the result is true. h From Theorem 6.3, it can be seen that a reduct of R relative to D is equivalent to a minimal subset of R that keeps invariant the decision object set U R and decision rules of each object in U. By Theorems 4.2 and 6.3, the discernibility matrix of a consistent relation decision system can be deﬁned as follows. Deﬁnition 6.4 Let (U, R, D) be a consistent relation decision system and U = {x1, x2, . . . , xn}. By M(U, R, D) we denote a n n matrix (cij), called the discernibility matrix of (U, R, D), and deﬁned as for xi, xj 2 U, (1) If d((Int R)s(xi)) 6¼ d(xj), then cij = {R 2 R:xj 62 Rs(xi)}; (2) If (Int R)s(xi) = d(xj), then ( cij ¼

R; fR 2 R : xj 62 Rs ðxi Þg

xj 2 U R ; xj 62 U R :

Theorem 6.4 Let (U, R, D) be a consistent relation decision system. Then the following statements hold: (1) If xj 62 U R , then cij 6¼ ; for "xi 2 U; (2) CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}; (3) Let P R, P \ cij 6¼ ; for all cij 6¼ ; if and only if P is an equivalence subset of R relative to D.

2250

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof (1) Since xj 62 U R , xj 62 (Int R)s(xi) for "xi 2 U, which implies that there is at least R 2 R such that xj 62 (Int R)s(xi). Hence R 2 cij, i.e., cij 6¼ ;. (2) ) Suppose R 2 CoreD(R), then R is indispensable in R. By Theorem 6.2, it follows that $xi, xj 2 U such that if xi, xj satisfy d((Int R)s(xi)) 6¼ d(xj), then we have xj 62 (Int R)s(xi) ) xj 2 (Int{R {R}})s(xi). It is easily seen that R is the only relation in R satisfying xj 62 Rs(xi). Thus by Deﬁnition 6.4, cij = {R}. If xi, xj satisfy d((Int R)s(xi)) = d(xj), then by Theorem 6.2 we have xj 62 U R ) xj 2 U fRfRgg . This implies that there exists xk 2 U such that xj 62 (Int R)s(xi) ) xj 2 (Int{R {R}})s(xk). Since there is only R 2 R satisfying xj 62 Rs(xk), by Deﬁnition 6.4 we have ckj = {R}. Therefore CoreD(R) {R 2 R:cij = {R}, i, j 6 n}. For xi, xj 2 U, suppose cij = {R}. If d((Int R)s(xi)) 6¼ d(xj), then xj 62 (Int R)s(xi) ) xj 2 (Int{R {R}})s(xi) by Deﬁnition 6.4. By Theorem 6.2, R 2 Core(R). If d((Int R)s(xi)) = d(xj) and xj 62 U R , then by Deﬁnition 6.4 we have xj 62 Rs(xi). Thus xj 2 (Int{R {R}})s(xi), namely xj 2 U fRfRgg . By Theorem 6.2, R 2 CoreD(R). Therefore CoreD(R) {R 2 R:cij = {R}, i, j 6 n}.Altogether CoreD(R) = {R 2 R:cij = {R}, i, j 6 n} (3) If xi, xj 2 U satisfy d((Int R)s(xi)) = d(xj) and xj 2 U R , then by Deﬁnition 6.4, cij = R. Thus P \ cij 6¼ ;. In the following, we prove other cases. Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, we have xj0 2 Rs ðxi0 Þ for "R 2 P, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since P is an equivalence subset of R relative to D, by Theorem 6.3 if dððInt RÞs ðxi0 ÞÞ 6¼ dðxj0 Þ, we have xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. If dððInt RÞs ðxi0 ÞÞ ¼ dðxj0 Þ and xj0 62 U R , then we have xj0 62 U R ) xj0 62 U P . Thus xj0 62 ðInt PÞs ðxi Þ for "xi 2 U, especially xj0 62 ðInt PÞs ðxi0 Þ, which is a contradiction to xj0 2 ðInt PÞs ðxi0 Þ. Hence P \ cij 6¼ ; for any cij 6¼ ;. ) Since P \ cij 6¼ ; for any cij 6¼ ;, we assume that R 2 P \ cij. If d((Int R)s(xi)) 6¼ d(xj), then we have xj 62 Rs(xi) ) xj 62 (Int R)s(xi) ) xj 62 (Int P)s(xi). If d((Int R)s(xi)) = d(xj) and xj 62 U R , then xj 62 (Int R)s(x)"x 2 U. Assume that xj 62 (Int P)s(x) is not true, i.e., 9xi1 2 U such that xj 2 ðInt PÞs ðxi1 Þ, then we have xj 2 Rs ðxi1 Þ for "R 2 P, which implies R 62 ci1 j . Thus P \ ci1 j ¼ ;. Since xj 62 U R , by (1) we know ci1 j 6¼ ;. This implies a contradiction. Hence xj 62 (Int P)s(x) is true. This means xj 62 U P . By Theorem 6.3, we know that P is an equivalence subset of R relative to D. h By Deﬁnition 6.2 and (3) of Theorem 6.4 we immediately get the following corollary. Corollary 6.1 Let P R, then P is a relative reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ;, (cij 6¼ ;, i, j 6 n). Deﬁnition 6.5 Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R, D) is a function deﬁned on (U, R), the corresponding Boolean variable Ri ði 6 nÞ is deﬁned for each binary relation Ri 2 R, and f(U, R, D) is deﬁned as: f ðU ; R; DÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg : i; j 6 n; cij 6¼ ;g: Then f(U, R, D) is a Boolean function of (U, R, D), called a discernibility function or discernibility formula of (U, R, D), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a consistent relation decision system. Theorem 6.5 Let (U, R, D) be a consistent relation decision system. M(U, R,D) = (cij:i,j 6 n) is the discernibility matrix of (U, R,D). The discernibility formula f(U, R, D) is defined as f ðU ; R; DÞ ¼ ^ni;j¼1 ð_cij Þ; ðcij 6¼ ;Þ. If f ðU ; R; DÞ ¼ _lk¼1 ð^Bk ÞðBk RÞ is obtained from f(U, R, D) by applying the multiplication and absorption laws,

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2251

which satisfies that each element in Bi appears only one time, then {Bk:k 6 l} is the collection of all the reducts of (U, R, D), i.e., Red(R) = {Bk:k 6 l}. Proof. The proof is similar to that of Theorem 4.7.

h

The following example is employed to illustrate our idea in this section. Example 6.1 Let (U, R, D) be a consistent relation decision system, where (U, R) is the relation information system in Example 4.1 and D is a decision equivalence relation on U. U =D ¼ ffx1 ; x3 ; x7 g; fx2 ; x5 ; x6 g; fx4 ; x8 ; x9 gg; U R ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x7 ; x8 ; x9 g: The discernibility matrix of (U, R, D) is as follows: 2

ðxj Þ

R R

6 6 6 6 R 6 6 6 R2 ; R3 ; R4 6 ðxi Þ 6 R 6 6 R 6 6 6 R 6 6 R 4 R

R R

R R

R2 ; R3 ; R4 R3

R R

R R

R R

R R1 ; R2 ; R4

R1 ; R3 ; R 4

R

R

R

R

R

R

R R

R R 1 ; R2 ; R4

R R1 ; R3 ; R4

R R

R3 R3

R R

R R1 ; R2 ; R3

R R

R R

R R

R R

R1 ; R 2 ; R3 R1 ; R 2 ; R4

R R

R R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

3 R R7 7 7 R7 7 7 R7 7 R7 7 7 R7 7 R7 7 7 R5 R

and f ðU ; RÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 9; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ^ R3 ^ ðR1 _ R2 _ R4 Þ ^ ðR1 _ R3 _ R4 Þ ^ ðR1 _ R2 _ R3 Þ ¼ R3 ^ ðR1 _ R2 _ R4 Þ ¼ ðR1 ^ R3 Þ _ ðR2 ^ R3 Þ _ ðR3 ^ R4 Þ: So Red(R) = {{R1, R3}, {R2, R3}, {R3, R4}}, Core(R) = {R3}. 7. Attribute reduction in relation decision systems As mentioned in the previous section, for a given relation decision system (U, R, D), if NegR(D) = ;, then (U, R,D) is a consistent relation decision system; otherwise, it is called an inconsistent relation decision system. Therefore, a consistent relation decision system is just a special case of relation decision system. This section is devoted to the investigation of attribute reduction in a relation decision system from the general point of view. Similar to the case of Pawlak’s rough sets [27,41], a relative reduct of a relation decision system is deﬁned as a minimal subset of conditional relation (attribute) set R that keeps invariant the positive domain of the decision equivalence relation. In the following discussion, we always suppose that U is a ﬁnite universe, R = {R1, R2, . . . , Rn} is a family of binary relations on U, D is a decision equivalence relation relative to R on U, and U/D = {[x]D:x 2 U} is the decision partition. For a given relation decision system, we always suppose POS 0R ðDÞ 6¼ ;. By the results in the Section 5 and Theorem 6.1, we ﬁrst deﬁne the concept of a relative reduct of a relation decision system. Deﬁnition 7.1 Let (U, R, D) be a relation decision system, Ri 2 R. Ri is called superﬂuous relative to D in R if Ri satisﬁes the following conditions: (1) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int{R {Ri}})s(x) = ;, i.e., NulR ðDÞ ¼ NulfRRi g ðDÞ.

2252

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Otherwise, Ri is called indispensable relative to D in R. For any P R, P is called a reduct of R relative to D if each element in P is indispensable relative to D in P, and for "x 2 U, (Int R)s(x) = ; implies (Int P)s(x) = ;, and POS 0R ðDÞ ¼ POS 0P ðDÞ. The collection of all indispensable elements relative to D in R is called the core of R relative to D, denoted as CoreD(R). By Deﬁnition 7.1 and (4) of Proposition 5.2, it can be easily seen that a reduct of R relative to D keeps both the positive domain POS 0R ðDÞ and its support domain POSR(D) invariant. Deﬁnition 7.2 Let (U, R, D) be a relation decision system, P R. P is called an equivalence subset of R relative to D if P satisﬁes the following conditions: (1) POS 0R ðDÞ ¼ POS 0P ðDÞ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;, i.e., NulR(D) = NulP(D). Theorem 7.1 Let (U, R, D) be a relation decision system, Ri 2 R. Then Ri is indispensable in R relative to D if and only if there is at least a pair of x, y 2 U such that if x 2 NulR (D), then y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) = d(y), then y 62 POSR(D) ) y 2 (Int(R Ri))s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x). Proof. ) If Ri is indispensable, then by Deﬁnition 7.1 we know that condition (1) or (2) in Deﬁnition 7.1 is not true. Suppose that condition (2) is not true, then no matter whether condition (1) is true, there must exist x 2 NulR(D) such that (Int R)s(x) = ;, but (Int{R {Ri}})s(x) 6¼ ;. Thus $y 2 U such that y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x). Suppose that condition (2) is true and condition (1) is not true, i.e., for "x 2 U, if (Int R)s(x) = ;, then (Int{R {Ri}})s(x) = ;, but POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ. Thus $X0 2 U/D such that P0 (R(X0)) 6¼ R0 (R(X0)), which implies P0 (R(X0)) R0 (R(X0))(P = R Ri). Hence, there must exist one object x 2 POS 0R ðDÞ such that (Int R)s(x) R(X0), but (Int(R Ri))s(x) R(X0) is not true. So there exists y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R {Ri}})s(x) and y 62 R(X0). If d((Int R)s(x)) = d(y), then y 2 X0. Since y 62 R(X0), we have y 62 POSR(D). Hence, we have y 62 POSR(D) ) y 2 (Int(R Ri))s(x). If d((Int R)s(x)) 6¼ d(y), obviously, we have y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x). If x 2 NulR(D), then (Int R)s(x) = ;. Thus we have y 62 (Int R)s(x) for " y 2 U. Since y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x), by Deﬁnition 7.1 it can been easily seen that Ri is indispensable. If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x) R(X0) X0 and y 2 X0 = [y]D. Since y 62 POSR (D), we have y 62 R(X0). Since y 62 POSR(D) ) y 2 (Int(R Ri))s(x), we know that (Int(R Ri))s(x) R(X0) is not true. Thus x 62 POS 0fRRi g ðDÞ, which implies that POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ, i.e., Ri is indispensable. If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then there exist X0 and X1(X0 6¼ X1) such that (Int R)s(x) R(X0) X0 and y 2 X1. Thus y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 2 (Int(R Ri))s(x), we know that (Int(R Ri))s(x) X0 is not true, which implies that (Int(R Ri))s(x) R(X0) is not true. Thus x 62 POS 0fRRi g ðDÞ. Hence POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ, i.e., Ri is indispensable. h Theorem 7.2 (Judgment theorem of attribute reduction). Let (U, R, D) be a relation decision system, P R. Then P is an equivalence subset of R relative to D if and only if for "x, y 2 U, if x 2 NulR(D), then y 62 (Int R)s(x) ) y 62 (Int P)s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) = d(y), then y 62 POSR(D) ) y 62 (Int P)s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 62 (Int P)s(x). Proof. ) Since P is an equivalence subset of R relative to D, it follows that for "x 2 U, if x 2 NulR(D), i.e., (Int R)s(x) = ;, then (Int P)s(x) = ;. Thus for "y 2 U, y 62 (Int R)s(x) ) y 62 (Int P)s(x).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2253

If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that x 2 R0 (R(X0)) and y 2 X0. Since y 62 POSR(D), we have y 62 R(X0). Since P is an equivalence subset of R relative to D, POS 0R ðDÞ ¼ POS 0P ðDÞ holds. By (1) of Proposition 5.2, we have R0 (R(X)) = P0 (R(X)) for "X 2 U/D. Especially R0 (R(X0)) = P0 (R(X0)). Thus x 2 P0 (R(X0)), which implies (Int P)s(x) R(X0) X0. Hence y 62 (Int P)s(x), i.e., y 62 POSR(D) ) y 62 (Int P)s(x). If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then there exist X0 and X1(X0 6¼ X1) such that x 2 R0 (R(X0)) and y 2 X1, which implies that y 62 (Int R)s(x) and y 62 X0. Since P is an equivalence subset of R relative to D, then POS 0R ðDÞ ¼ POS 0P ðDÞ. By (1) of Proposition 5.2, we have R0 (R(X)) = P0 (R(X)) for"X 2 U/D. Especially R0 (R(X0)) = P0 (R(X0)). Thus x 2 P0 (R(X0)). This implies (Int P)s(x) R(X0) X0. Hence y 62 (Int P)s(x), i.e., y 62 (Int R)s(x) ) y 62 (Int P)s(x). (1) If x 2 NulR(D), then (Int R)s(x) = ;. Thus y 62 (Int R)s(x) for "y 2 U. Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x) = ;. For x 2 POS 0R ðDÞ, there are the following two cases: (2) If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x) R(X0) X0 and y 2 X0 = [y]D. Since y 62 POSR (D), we have y 62 R(X0) and y 62 (Int R)s(x). Thus y 62 POSR(D) ) y 62 (Int P)s(x) is equivalent to y 62 R(X0) ) y 62 (Int P)s(x), which implies (Int(P))s(x) R(X0). Hence x 2 P0 (R(X0)). By the deﬁnition of POS 0P ðDÞ; x 2 POS 0P ðDÞ. (3) If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x) R(X0) X0 and y 62 X0, which implies y 62 (Int R)s(x). Thus y 62 (Int R)s(x) ) y 62 (Int P)s(x) is equivalent to y 62 X0 ) y 62 (Int P)s(x). Hence (Int P)s(x) X0. Assume that (Int P)s(x) R(X0) is not true, then there must exist y0 2 X0 such that y0 62 R(X0) and y0 2 (Int P)s(x). This means d((Int R)s(x)) = d(y0) and y0 62 POSR(D). Thus we have x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðy 0 Þ, at the same time y0 62 POSR(D) ) y0 2 (Int P)s(x). This is a contradiction to case (2). Hence (Int P)s(x) R(X0), which implies x 2 P0 (R(X0)). By the deﬁnition of POS 0P ðDÞ, we also have x 2 POS 0P ðDÞ. By (2) and (3), we can conclude that x 2 POS 0P ðDÞ for any x 2 POS 0R ðDÞ. This means POS 0R ðDÞ POS 0P ðDÞ. By (7) of Proposition 5.1, we have POS 0R ðDÞ POS 0P ðDÞ. Therefore POS 0R ðDÞ ¼ POS 0P ðDÞ. Under the assurance of accomplishment of the above three cases, for "x 2 NegR(D), we have x 62 POS 0R ðDÞ () x 62 POS 0P ðDÞ. Altogether POS 0R ðDÞ ¼ POS 0P ðDÞ. So the result is true. h From Theorem 7.2, we know that P is an equivalence subset of R relative to D if and only if the null, negative, and positive domains of D relative to R and P are respectively and correspondingly equal. By Theorems 4.2 and 7.2, the discernibility matrix of a relation decision system can be deﬁned as follows. Deﬁnition 7.3 Let (U, R, D) be a relation decision system, U = {x1, x2, . . . , xn}. By M(U, R, D) we denote a n n matrix (cij), called the discernibility matrix of (U, R, D), deﬁned as for xi, xj 2 U: (1) If xi 2 NulR(D), then cij = {R 2 R:xj 62 Rs(xi)}; (2) If xi 2 POS 0R ðDÞ and d((Int R)s(xi)) = d(xj), then ( R; xj 2 POS R ðDÞ; cij ¼ fR 2 R : xj 62 Rs ðxi Þg xj 62 POS R ðDÞ; (3) If xi 2 POS 0R ðDÞ and d((Int R)s(xi)) 6¼ d(xj), then cij = {R 2 R:xj 62 Rs(xi)}; (4) If xi 2 NegR(D), then cij = R.

Theorem 7.3 Let (U, R, D) be a relation decision system. Then the following statements hold: (1) CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}; (2) Let P R, P is an equivalence subset of R relative to D if and only if P \ cij 6¼ ; for all cij 6¼ ;.

2254

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof (1) ) Suppose R 2 CoreD(R), then R is indispensable in R. By Theorem 7.1, it follows that there exist xi, xj 2 U such that if xi, xj satisfy xi 2 NulR(D), then xj 62 (Int R)s(xi) ) xj 2 (Int(R R))s(xi). Obviously, there is only R satisfying xj 62 Rs(xi). Thus cij = {R} by Deﬁnition 7.3. If xi, xj satisfy xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, then xj 62 POSR(D) ) xj 2 (Int(R Ri))s(x), which implies that xj 62 (Int R)s(xi), but xj 2 (Int{R {R}})s(xi). Thus there is only R satisfying xj 62 Rs(xi). By Deﬁnition 7.3, cij = {R}. If xi, xj satisfy xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ 6¼ dðxj Þ, then xj 62 (Int R)s(xi) ) xj 2 (Int{R {R}})s(xi). Similarly cij = {R}. Therefore CoreD(R) {R 2 R:cij = {R}, i, j 6 n}. For xi, xj 2 U, suppose cij = {R}. If xi 2 NulR(D), then by Deﬁnition 7.3, xj 62 (Int R)s(xi) ) xj 2 (Int{R {R}})s(xi). By Theorem 7.1, R 2 Core(R). If xi 2 POS 0R ðDÞ; dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, and xj 62 POS 0R ðDÞ, then by Deﬁnition 7.3, xj 62 (Int R)s(xi), but xj 2 (Int{R {R}})s(xi). By Theorem 7.1, R 2 Core(R). If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, similarly R 2 CoreD(R). Thus CoreD(R) {R 2 R:cij = {R}, i, j 6 n}. Altogether CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}. (2) If xi 2 POS 0R ðDÞ; dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, xj 2 POSR(D); and if xi 2 NegR(D), then cij = R by Deﬁnition 7.3. Thus P \ cij 6¼ ;. In the following, we prove other cases: Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, by Deﬁnition 7.3 we have xj0 2 Rs ðxi0 Þ for "R 2 P, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since P is an equivalence subset of R relative to D, by Theorem 7.2 it follows that if xi0 2 NulR ðDÞ, then xj0 2 ðInt PÞs ðxi0 Þ ) xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. If xi0 2 POS 0R ðDÞ; dððInt RÞs ðxi0 ÞÞ ¼ dðxj0 Þ; and xj0 62 POS R ðDÞ, then xj0 62 POS R ðDÞ ) xj0 62 ðInt PÞs ðxi0 Þ. Thus $R 2 P such that xj0 62 ðIntRÞs ðxi0 Þ, which implies R 2 ci0 j0 . Hence R 2 P \ ci0 j0 6¼ ;, which is a contradiction to the assumption. If xi0 2 POS 0R ðDÞ and dððInt RÞs ðxi0 ÞÞ 6¼ dðxj0 Þ, then xj0 2 ðInt PÞs ðxi0 Þ ) xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for "R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. Altogether P \ cij 6¼ ; for any cij 6¼ ;. ) Since P \ cij 6¼ ; for any cij 6¼ ;, we assume that R 2 P \ cij. If xi 2 NulR(D), then xj 62 Rs(xi) ) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi). If xi 2 POS 0R ðDÞ, d((Int R)s(xi)) = d(xj), and xj 62 POSR(D), then xj 62 Rs(xi) ) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi), i.e., xj 62 POSR(D) ) xj 62 (Int P)s(xi). If xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ 6¼ dðxj Þ, then xj 62 Rs(xi) D.

) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi). By Theorem 7.2, we know that P is an equivalence subset of R relative to h By Deﬁnition 7.1 and (2) of Theorem 7.3 we immediately get the following corollary.

Corollary 7.1. Let P R, then P is a relative reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ;, (cij 6¼ ;, i, j 6 n). Deﬁnition 7.4. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R, D) is a function deﬁned on (U, R, D), the corresponding Boolean variable Ri ði 6 nÞ is deﬁned for each binary relation Ri 2 R, and f(U, R, D) is deﬁned as: f ðU ; R; DÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg : i; j 6 n; cij 6¼ ;g: Then f(U, R, D) is a Boolean function of(U, R, D), called a discernibility function or discernibility formula of(U, R, D), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a relation decision system. Theorem 7.4. Let (U, R, D) be a relation decision system. M(U, R,D) = (cij:i,j 6 n) is the discernibility matrix of (U, R,D). The discernibility formula f(U, R, D) is defined as f ðU ; R; DÞ ¼ ^ni;j¼1 ð_cij Þ; ðcij 6¼ ;Þ. If f ðU ; R; DÞ ¼ _lk¼1 ð^Bk ÞðBk RÞ is obtained from f(U, R, D) by applying the multiplication and absorption laws, which satisfies that each element in Bi appears only one time, then the set {Bk:k 6 l} is the collection of all the reducts of (U, R, D), i.e., Red(R) = {Bk:k 6 l}.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof. The proof is similar to that of Theorem 4.7.

2255

h

The following example is employed to illustrate our idea in this section. Example 7.1. In order to survey the inﬂuence on life level by the four factors: education, profession, diligence and temperament, we recruit 10 volunteers to participate in the test. Let U = {x1, x2, . . . , x10} be a set of ten volunteers. The values of education are {high, low}, the values of profession are {good, poor}, the values of diligence are {yes, no}, the values of temperament are {good, bad}. D denotes life level. According to the social life standard, the ten volunteers are described as follows. U =D ¼ ffx1 ; x5 ; x6 gðbetterÞ; fx2 ; x4 ; x7 gðgoodÞ; fx3 ; x8 gðgeneralÞ; fx9 ; x10 gðbadÞg: After these volunteers live together and know each other for a time, let them evaluate each other for each given attribute. Judgment results are written according to the following rules. Denote E(xi, xj) iﬀ xj considers xi to be of high education, D(xi, xj) iﬀ xj considers xi to be of diligence, P(xi, xj) iﬀ xj considers xi to be of good profession, and T(xi, xj) iﬀ xj considers xi to be of good temperament. Assuming the evaluation of each volunteer is of the same importance, for each attribute we get a binary relation, which embodies a kind of uncertainty caused by diﬀerent interpretation of the data. Education: R1 ¼ fðx1 ; x3 Þ; ðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx2 ; x6 Þ; ðx2 ; x7 Þ; ðx4 ; x4 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þðx9 ; x9 Þ; ðx9 ; x10 Þ; ðx10 ; x10 Þg; Diligence: R2 ¼ fðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx2 ; x6 Þ; ðx3 ; x8 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx4 ; x7 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx5 ; x5 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þ; ðx10 ; x6 Þg; Profession: R3 ¼ fðx1 ; x3 Þ; ðx2 ; x1 Þ; ðx2 ; x2 Þ; ðx2 ; x5 Þ; ðx2 ; x7 Þ; ðx4 ; x4 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x2 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þg; Temperament: R4 ¼ fðx1 ; x2 Þ; ðx2 ; x1 Þ; ðx2 ; x2 Þ; ðx2 ; x5 Þ; ðx2 ; x8 Þ; ðx3 ; x1 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx4 ; x7 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x7 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þðx9 ; x9 Þ; ðx9 ; x10 Þg: Let R = {R1, R2, R3, R4}, then Int R ¼ fðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þg: ðInt RÞs ðx1 Þ ¼ ðInt RÞs ðx3 Þ ¼ ðInt RÞs ðx10 Þ ¼ ;; ðInt RÞs ðx2 Þ ¼ fx1 ; x5 g; ðInt RÞs ðx4 Þ ¼ fx5 ; x6 g; ðInt RÞs ðx5 Þ ¼ fx2 ; x4 g; ðInt RÞs ðx6 Þ ¼ fx10 g; ðInt RÞs ðx7 Þ ¼ fx2 ; x4 ; x8 g; ðInt RÞs ðx8 Þ ¼ fx3 ; x8 g; ðInt RÞs ðx9 Þ ¼ fx7 ; x9 ; x10 g: RðX 1 Þ ¼ fx1 ; x5 ; x6 g; RðX 2 Þ ¼ fx2 ; x4 g; R0 ðX 2 Þ ¼ fx5 g;

R0 ðX 3 Þ ¼ fx8 g;

RðX 3 Þ ¼ fx3 ; x8 g;

RðX 4 Þ ¼ fx10 g;

R0 ðX 1 Þ ¼ fx2 ; x4 g;

R0 ðX 4 Þ ¼ fx6 g:

Therefore, NulR(D) = {x1, x3, x10}, NegR(D) = {x7,x9}, POSR(D) = {x1, x2,x3,x4,x5,x6,x8,x10}, POS 0R ðDÞ ¼ fx2 ; x4 ; x5 ; x6 ; x8 g and cR ðDÞ ¼

jPOS 0R ðDÞj jU j

¼ 0:5.

2256

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The discernibility matrix of (U, R, D) is as follows: 2

ðxj Þ

R

6 R 6 6 6 R1 R2 R3 6 6 6 R 6 6 R 6 ðxi Þ6 6 R 6 6 6 R 6 6 R 6 6 4 R

R1 R2 R3

R2 R4

R

R

R

R

R

R

R

R1 R2

R

R

R

R

R2 R4

R1 R2 R3

R

R

R

R

R

R

R

R

R1 R3 R4

R

R

R

R

R2 R4

R

R

R1 R3

R

R

R

R

R

R

R1 R3 R4

R

R

R

R

R

R

R

R

R

R2

R

R

R

R

R

R

R

R

R

R

R

R

R

R1 R2 R4

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R1 R3 R4

R

R

R

R2 R3 R4

R

3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

and f ðU ; R; DÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 10; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 Þ ^ ðR2 _ R4 Þ ^ ðR1 _ R2 Þ ^ ðR1 _ R3 Þ ^ ðR1 _ R3 _ R4 Þ ^ R2 ^ ðR1 _ R2 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ¼ ðR1 _ R3 Þ ^ R2 ¼ ðR1 ^ R2 Þ _ ðR2 ^ R3 Þ So Red(R) = {{R1, R2},{R2, R3}}, Core(R) = {R2}. In view of assessing results of the selected volunteers, life level is only partially dependent upon the four factors, i.e., the dependency degree cR(D) of life level upon these factors is 0.5. Of the four factors, temperament is indispensable element and diligence is the most critical element to contribution to life level. 8. Experimental analysis Feature selection and attribute reduction are the basic application of rough sets. Pawlak’s rough sets can just be used to select nominal features because the model is built on equivalence relations and equivalence classes which can be directly generated from nominal features. Rough sets based on general binary relations can be used to compute more complex data, such as numerical data. In order to use rough sets based on general binary relations to reduce numerical data, one should ﬁrst construct a technique to compute the relation between objects characterized with numerical features, and then compute the positive domain of the decision to evaluate the quality of selected features. In the experiments, we associate a box neighborhood with each sample, the samples in the box neighborhood of sample xi are called the successor neighborhood of xi. Formally, the successor neighborhood of xi is deﬁned as RB(xi) = {xj"a 2 B, Da(xi, x) 6 e}, where Da is the distance function in feature space a, deﬁned as Da(xi,x) = jxi xj, and e P 0 is a user-speciﬁed threshold. Obviously [ni¼1 RB ðxi Þ ¼ U and RB(xi) 6¼ ;. Then the signiﬁcance of B with respect to D is deﬁned as cB ðDÞ ¼

jPOS 0B ðDÞj : jU j

Accordingly, the conditional signiﬁcance of a in B can be deﬁned as SIGða; B; DÞ ¼

jPOS 0B ðDÞj jPOS 0Ba ðDÞj : jU j jU j

Based on the measures of signiﬁcance, we can compute the signiﬁcance and conditional signiﬁcance of single attribute and select the attributes with maximal signiﬁcance and conditional signiﬁcance one by one. There are 13 numerical features in wine data downloaded from UCI Machine Learning Repository [23]. We compute the signiﬁcance cai ðDÞ of these 13 attributes, as shown in Fig. 1a. We can ﬁnd that feature 13 get the

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

a

2257

b

0.2

0.4

0.15 0.1

0.2 0.05 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13

c

1 2 3 4 5 6 7 8 9 10 11 12

d

0.4

0.1

0.05

0.2

0

0 1 2 3 4 5 6 7 8 9 10 11

e

1 2 3 4 5 6 7 8 9 10

f

0.03

1

0.02 0.5 0.01

0

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

Fig. 1. Signiﬁcance and conditional signiﬁcance of attributes.

maximal signiﬁcance among the set of features; so we select feature 13 into the reduct. In the second round, we calculate the conditional signiﬁcance ca13 [ai ðDÞ ca13 ðDÞ of the rest 12 features, and feature 10 yields the maximal value of signiﬁcance. Finally, features 13, 10, 7, 11 and 1 are selected in the reduct. The signiﬁcance cfa13 ;a10 ;at ;a11 ;a1 g ðDÞ of these ﬁve features is 1. This means that the decision table is consistent in this granularity, here e = 0.1. Fig. 1f shows the dependency cBi ðDÞ of the decision attribute on the selected features. We here use a heuristic search strategy to search the reduct. We start with an empty set of attribute, and select the feature which maximizes the signiﬁcance increment in each round until the attribute signiﬁcance does not increase or increases less than a speciﬁed threshold. As we select the feature with the maximal signiﬁcance increment, the features with high relevance to a selected feature can not be included because no much new information is introduced by the relevant features [50]. It is notable that the search strategy is only one of the candidate solutions; there are also other strategies to search reducts [10,17,25,26]. As a whole, we gather six data sets from UCI Machine Learning Repository. The data sets are described in Table 2, where N, F and C stands for the numbers of samples, features and classes, respectively. Table 2 Data description Data

S

F

C

Credit Hepatitis Heart Iono Wdbc Wine

690 155 270 351 569 178

15 19 13 34 30 13

2 2 2 2 2 3

2258

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The features selected with rough sets based on general binary relations (RS-GBR), Pawlak rough sets, consistency [10] and CFS [17] are presented in Tables 3–6 respectively, where Pawlak rough sets and consistency can not deal with numerical features, we introduce MDL entropy discretization algorithm to discretize these features before reduction. Tables 7 and 8 show the average classiﬁcation performance of raw data and reduced data based on 10-fold cross validation, respectively, where linear and RBF-SVM are introduced to validate the selected features. These algorithms are implemented with OSU_SVM3.00 downloaded from http://www.ece. osu.edu/maj/osu_svm/osu_svm3.00.zip. It is easy to ﬁnd that the performances of RS-GBR based reducts in most cases are comparable or improved and the method based on RS-GBR outperforms discretization+ Pawlak rough sets.

Table 3 Features selected with RS-GBR algorithms Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

11, 2, 6, 14, 3, 9 2, 17, 1, 18, 15, 11, 9 10, 12, 3, 13, 1, 4, 5, 7, 8 1, 5, 13, 34, 24, 25, 3, 26 23, 22, 28, 12, 25, 9, 1, 27, 2, 26, 18, 11, 15, 29, 19 13, 10, 7, 6, 1

6 7 9 8 15 5

Table 4 Features selected with Pawlak’s rough set algorithms Data

Features

Number

Credit Hepatitis Hart Iono Wdbc Wine

4, 7, 9, 15, 1, 3, 11, 6, 14, 8, 2 2, 18, 8, 10, 4, 5, 17, 19, 13, 15, 3, 12 – 5, 3, 6, 34, 17, 14, 22, 4 24, 8, 22, 26, 13, 5, 14 10, 13, 7, 2

11 12 8 7 4

Table 5 Features selected with consistency Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

9, 1, 1, 5, 7, 1,

10 4 13 7 7 5

4, 10, 15, 14, 2, 6, 8, 3, 1, 11 6, 17, 18 2, 3, 7, 8, 9, 10, 11, 12, 13 6, 8, 13, 22, 27, 34 13, 21, 22, 27, 28, 29 3, 4, 7, 10

Table 6 Features selected with CFS Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

5, 1, 3, 1, 2, 1,

7 7 7 14 11 11

6, 2, 7, 3, 7, 2,

8, 6, 8, 4, 8, 3,

9, 11, 14, 15 11, 14, 17, 18 9, 10, 12, 13 5, 6, 7, 8, 14, 18, 21, 27, 28, 29, 34 14, 19, 21, 23, 24, 25, 27, 28 4, 5, 6, 7, 10, 11, 12, 13

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2259

Table 7 Comparison of classiﬁcation performances with linear SVM

Credit Heart Hepatitis Iono Wdbc Wine

Raw data

RS-GBR

Pawlak’s rough sets

CFS

Consistency

81.44 ± 7.18 83.33 ± 5.31 86.17 ± 7.70 87.57 ± 6.45 97.73 ± 2.48 98.89 ± 2.34

85.48 ± 18.51 83.33 ± 6.59 89.00 ± 6.30 87.25 ± 5.47 97.02 ± 2.03 98.33 ± 2.68

82.88 ± 14.34 – 85.00 ± 7.24 83.30 ± 5.97 95.09 ± 2.83 95.00 ± 4.10

80.12 ± 14.0 84.81 ± 5.91 90.17 ± 6.59 86.38 ± 5.35 96.32 ± 1.92 98.89 ± 2.34

81.86 ± 14.37 84.44 ± 4.88 88.00 ± 5.26 83.21 ± 6.37 95.61 ± 2.51 95.97 ± 4.83

Table 8 Comparison of classiﬁcation performances with RBF SVM

Credit Heart Hepatitis Iono Wdbc Wine

Raw data

RS-GBR

Pawlak’s rough sets

CFS

Consistency

81.44 ± 7.18 81.11 ± 7.50 83.50 ± 5.35 93.79 ± 5.07 98.08 ± 2.25 98.89 ± 2.34

85.63 ± 18.48 80.74 ± 4.88 86.17 ± 6.85 91.48 ± 5.26 97.20 ± 2.05 97.15 ± 3.01

81.00 ± 16.25 – 84.17 ± 8.21 91.54 ± 5.53 95.61 ± 2.37 97.22 ± 2.93

85.05 ± 17.79 80.74 ± 6.72 89.67 ± 5.54 95.19 ± 4.43 96.84 ± 1.80 98.89 ± 2.34

82.78 ± 15.56 80.37 ± 6.77 90.33 ± 5.54 92.62 ± 3.74 96.49 ± 2.61 97.15 ± 3.99

Moreover, Pawlak rough set based reduction ﬁnd nothing for data heart. If each feature produces zero signiﬁcance in greedy forward search reduction, no feature can be obtained because the search procedure stops here. Variable precision rough set model is a candidate solution to this problem. We can ﬁnd that as to data credit the features selected by RS-GBR is a subset of the features selected by consistency. Although features 1, 4, 8, 10, 15 are deleted from the subset of features selected by consistency, the classiﬁcation performances are further improved. This means there is redundant information in the set of features selected by consistency. 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

2

4

6

8

10

0

0

2

4

Heart 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

2

4

Iono

6

8

Hepatitis

6

8

0

0

5

10

Wdbc

Fig. 2. Variation of signiﬁcance as features are selected one by one.

15

2260

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Fig. 2 shows the variation of signiﬁcance of selected features. There are some similar characteristics with these four curves. The signiﬁcance goes up quickly at the ﬁrst stages, and then slows down. Furthermore, the maximal value of signiﬁcance is 1 as to these four data sets, this shows the data sets used here are consistent in the granularity level of 0.1. 9. Conclusions Attribute reduction in Pawlak’s rough set theory is based on equivalence relations, but this condition does not always hold in many practical problems and this limits the wide application of the theory. Therefore, we relax the equivalence relations to general binary relations, and propose methods to reduce useless attributes of relation information systems, consistent relation decision systems and relation decision systems, and develop the theorems necessary for computation of all the reducts. Experimental results show that the proposed methods have great power in attribute reduction and can be used to deal with more complex data sets. Our reduction methods can be used to select useful features and eliminate redundant and irrelevant information. With the above discussion, the theory of attribute reduction with rough sets based on general binary relations has been established. It should be pointed out that the proposed reduction methods are natural generalizations of the reduction methods in Pawlak’s rough set theory [41]. Acknowledgements The authors are highly grateful to the referees and the Editor-in-Chief, Professor Witold Pedrycz, for their valuable comments and suggestions for improving the paper. The research is supported by National Natural Science Foundation of China (Nos. 10571025 and 60703013). References [1] J. Bazan, A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery, Physica-Verlag, Heidelberg, 1998, pp. 321–365. [2] M. Beynon, Reducts within the variable precision rough sets model: a further investigation, European Journal of Operational Research 134 (2001) 592–605. [3] Z. Bonikowski, Algebraic structures of rough sets, in: W. Ziarko (Ed.), Rough Sets, Fuzzy Sets and Knowledge Discovery, SpringerVerlag, London, 1994, pp. 243–247. [4] Z. Bonikowski, E. Bryniarski, U. Wybraniec, Extensions and intentions in the rough set theory, Information Sciences 107 (1998) 149– 167. [5] E. Bryniaski, A calculus of rough sets of the ﬁrst order, Bulletin of the Polish Academy of Sciences 16 (1989) 71–77. [6] G. Cattaneo, Abstract approximate spaces for rough theories, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications, Physica-Verlag, Heidelberg, 1998, pp. 59–98. [7] D.-G. Chen, The part reductions in information systems, in: S. Tsumoto, R. Slowinski, H.J. Komorowski, J.W. Srzymala-Busse (Eds.), Rough Sets and Current Trends in Computing, LNAI3066, Springer-Verlag, Berlin, 2004, pp. 477–482. [8] D.-G. Chen, C.Z. Wang, Q.H. Hu, A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets, Information Sciences 177 (17) (2007) 3500–3518. [9] D.-G. Chen, W.-X. Zhang, D. Yeung, E.C.C. Tsang, Rough approximation on a complete completely distributive lattice with applications to generalized rough sets, Information Sciences 176 (2006) 1829–1848. [10] M. Dash, H. Liu, Consistency-based search in feature selection, Artiﬁcial Intelligence 151 (2003) 155–176. [11] S. Greco, B. Matarazzo, R. Slowinski, Rough set theory for multicriteria decision analysis, European Journal of Operational Research 129 (2001) 1–47. [12] S. Greco, B. Matarazzo, R. Slowinski, Rough sets methodology for sorting problems in presence of multiple attributes and criteria, European Journal of Operational Research 38 (2002) 247–259. [13] Q.H. Hu, D.R. Yu, Entropies of fuzzy indiscernibility relation and its operations, International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 12 (5) (2004) 575–589. [14] Q.H. Hu, D.R. Yu, Z.X. Xie, J.F. Liu, Fuzzy probabilistic approximation spaces and their information measures, IEEE Transactions on Fuzzy Systems 14 (2) (2006) 191–201. [15] Q. Hu et al., Neighborhood classiﬁers, Expert Systems with Applications (2006), doi:10.1016/j.eswa.2006.10.043. [16] Q.H. Hu, D.R. Yu, Z.X. Xie, Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters 27 (5) (2006) 414–423.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2261

[17] M. Hall, Correlation-based feature selection for discrete and numeric class machine learning, in: Proceedings of the 17th ICML, CA, 2000, pp. 359–366. [18] M. Kryszkiewicz, Rough set approach to incomplete information systems, Information Sciences 112 (1998) 39–49. [19] M. Kryszkiewicz, Rules in incomplete information systems, Information Sciences 113 (1999) 271–292. [20] M. Kryszkiewski, Comparative study of alternative type of knowledge reduction in inconsistent systems, International Journal of Intelligent Systems 16 (2001) 105–120. [21] J.-S. Mi, W.-Z. Wu, W.-X. Zhang, Approaches to knowledge reductions based on variable precision rough sets model, Information Sciences 159 (3–4) (2004) 255–272. [22] J.N. Mordeson, Rough set theory applied to (fuzzy) ideal theory, Fuzzy Sets and Systems 121 (2001) 315–332. [23] D.J. Newman, S. Hettich, C.L. Blake, C.J. Merz, UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/ MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA, 1998. [24] S.H. Nguyen, PhD Thesis, Data regularity analysis and applications in data mining, in: L. Polkowski, T.Y. Lin, S. Tsumoto (Eds.), Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, volume 56 of Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, 2000, pp. 289–378. [25] H.S. Nguyen, D. Slezak, Approximation reducts and association rules correspondence and complexity results, in: N. Zhong, A. Skowron, S. Oshuga (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, LNAI 1711, Springer, Berlin, 1999, pp. 137–145. [26] Hung Son Nguyen, Approximate Boolean reasoning: foundation and application in data mining, in: J.F. Peters, A. Skowron (Eds.), Transactions on Rough Sets V, LNCS, vol. 4100, Springer-Verlag, Heidelberg, 2006, pp. 334–506. [27] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [28] Z. Pawlak, Rough set theory and its applications in data analysis, International Journal of Cybernetics and Systems 29 (1998) 661– 688. [29] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Boston, 1991. [30] Z. Pawlak, Andrzej Skowron, Rudiments of rough sets, Information Sciences 177 (2006) 3–27. [31] Z. Pawlak, Andrzej Skowron, Rough sets: some extensions, Information Sciences 177 (2006) 28–40. [32] Z. Pawlak, Andrzej Skowron, Rough sets and Boolean reasoning, Information Sciences 177 (2006) 41–73. [33] L. Polkowski, A. Skworon, J. Zytkow, Rough foundations for rough sets, in: Proceedings of the Third International Workshop on Rough Sets and Soft Computing (RSSC’94), San Jose State University, CA,1994, pp. 142–149. [34] L. Polkowski, A. Skworon, J. Zytkow, Rough foundations for rough sets, in: T.Y. Lin, A.M. Wildberger (Eds.), Soft Computing, Simulation Councils, Inc., San Diego, 1995, pp. 55–58. [35] J.A. Pomykala, Approximation operations in approximation space, Bulletin of the Polish Academy of Sciences 35 (1987) 653–662. [36] M. Quafatou, a-RST: a generalization of rough set theory, Information Sciences 124 (2000) 301–316. [37] D. Slezak, Searching for dynamic reducts in inconsistent decision tables, in: Proceedings of IPMU’ 98, France, vol. 2, 1998, pp. 1362– 1369. [38] D. Slezak, Approximate reducts in decision tables, in: Proceedings of IPMU’96, Granada, Spain, vol. 3, 1996, pp. 1159–1164. [39] J. Stefanowski, On rough set based approaches to induction of decision rules, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery, vol. 1, Physica-Verlag, Heidelberg, 1998, pp. 500–529. [40] Q. Shen, R. Jensen, Selecting informative features with fuzzy rough sets and its application for complex systems monitoring, Pattern Recognition 37 (2004) 1351–1363. [41] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, in: R. Slowinski (Ed.), Intelligent Decision Support – Handbook of Applications and Advances of Rough Sets Theory, Kluwer Academic Publishers, Boston, 1992, pp. 331–362. [42] A. Skowron, I. Stepaniuk, Tolerance approximation spaces, Fundamenta Informaticae 27 (1996) 245–253. [43] R. Slowinski, D. Vanderpooten, A generalized deﬁnition of rough approximations based on similarity, IEEE Transactions on Knowledge and Data Engineering 12 (2) (2000) 331–336. [44] W.-Z. Wu, M. Zhang, H.-Z. Li, J.-S. Mi, Knowledge reduction in random information systems via Dempster–Shafer theory of evidence, Information Sciences 174 (2005) 143–164. [45] U. Wybraniec-Skardowska, On a generalization of approximation space, Bulletin of the Polish Academy of Sciences, Mathematics 37 (1989) 51–61. [46] Y.Y. Yao, Constructive and algebraic method of theory of rough sets, Information Sciences 109 (1998) 21–47. [47] Y.Y. Yao, Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111 (1998) 239–259. [48] D.S. Yeung, D.G. Chen, E.C.C. Tsang, J.W.T. Lee, X.Z. Wang, On the generalization of fuzzy rough sets, IEEE Transactions on Fuzzy Systems 13 (3) (2005) 343–361. [49] D. Yu, Q.H. Hu, W. Bao, Combining rough set methodology and fuzzy clustering for knowledge discovery for quantitative data, Proceedings of Chinese Society of Electrical Engineering 24 (6) (2004) 205–210. [50] L. Yu, H. Liu, Eﬃcient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research 5 (2004) 1205–1224. [51] W. Zakowski, Approximations in the space (U, P), Demonstratio Mathematica 16 (1983) 761–769. [52] W.-X. Zhang, W.-Z. Wu, J.-Y. Liang, D.-Y. Li, Theory and Methods of Rough Sets, Science Press, Beijing, 2001. [53] William Zhu, Topological approaches to covering rough sets, Information Sciences 177 (2007) 1892–1915. [54] W. Zhu, F.-Y. Wang, Reduction and axiomization of covering generalized rough sets, Information Sciences 152 (2003) 217–230. [55] W. Ziarko, Variable precision rough set model, Journal of Computer and System Sciences 46 (1993) 39–59.

Recommend Documents

Parameterized attribute reduction with Gaussian ... - Semantic Scholar

A Systematic Mapping Study on Software ... - Semantic Scholar

A New Attribute Reduction Recursive Algorithm ... - Semantic Scholar

A general approach to attribute reduction in rough ... - Semantic Scholar