A systematic study on attribute reduction with ... - Semantic Scholar

Report 2 Downloads 64 Views
Available online at www.sciencedirect.com

Information Sciences 178 (2008) 2237–2261 www.elsevier.com/locate/ins

A systematic study on attribute reduction with rough sets based on general binary relations Changzhong Wang a,*, Congxin Wu a, Degang Chen b a

b

Department of Mathematics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, PR China Department of Mathematics and Physics, North China Electric Power University, Beijing 102206, PR China Received 21 March 2006; received in revised form 9 January 2008; accepted 9 January 2008

Abstract Attribute reduction is considered as an important preprocessing step for pattern recognition, machine learning, and data mining. This paper provides a systematic study on attribute reduction with rough sets based on general binary relations. We define a relation information system, a consistent relation decision system, and a relation decision system and their attribute reductions. Furthermore, we present a judgment theorem and a discernibility matrix associated with attribute reduction in each type of system; based on the discernibility matrix, we can compute all the reducts. Finally, the experimental results with UCI data sets show that the proposed reduction methods are an effective technique to deal with complex data sets.  2008 Elsevier Inc. All rights reserved. Keywords: Attribute reduction; Discernibility matrix; Rough sets based on general binary relations; Relation information systems; Relation decision systems

1. Introduction The theory of rough sets, proposed by Pawlak [27], is an extension of the set theory for study of information systems characterized by insufficient and imperfect data, and it has been successfully applied in such artificial intelligence fields as machine learning, pattern recognition, decision analysis, process control, knowledge discovery in databases, and expert systems. One application of rough set theory is to approximate an arbitrary subset of a universe by two definable subsets called lower and upper approximations [6,25,43,46,52,55]. Another application is to reduce the number of attributes in databases. Given a data set with discrete attribute values, we can find a subset of attributes that are the most informative and has the same discernible capability as the original attributes. Attribute reduction has been studied from the viewpoint of independence of knowledge [28]. The notion of a reduct was proposed as a minimal subset of attributes that induce the same indiscernibility relation as the whole *

Corresponding author. E-mail addresses: [email protected] (C.Z. Wang), [email protected] (D.G. Chen).

0020-0255/$ - see front matter  2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.01.007

2238

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

set of attributes. Many types of attribute reductions have been proposed in recent years [1,2,20,21,25,29, 36–39,55], much attention has been paid to attribute reduction in inconsistent decision systems. For example, possible rules and possible reducts have been proposed as a means to deal with inconsistence in inconsistent decision tables [20,29]. A possible rule covers only objects belonging to the upper approximation of the decision class determined by the rule’s consequent. Approximation rules [39] are also used as an alternative to possible rules. The notions of a-reduct and a-relative reduct for decision tables were introduced in [25]. The a-reduct allows occurrence of additional inconsistency that is controlled by means of a parameter. In [37], a new concept of attribute reduction that preserves class membership distribution was presented. It was shown in [38] that the attribute reduction preserving the membership distribution is equivalent to the attribute reduction preserving the value of a generalized inference measure function. A generalized attribute reduction was also introduced in [38] that allows the value of the generalized inference measure function after the attribute reduction to be different from the original one by user-specified threshold. The notion of dynamic reducts was described in [1]. Dynamic reducts are just a subset of all reducts derived from both the original decision table and the majority of randomly chosen decision sub-tables. In [2,21,55], b-reduct based on VPRS was introduced. This type of reducts can be used to overcome the problem of noise in data. Kryszkiewicz [20] investigated and compared five notions of attribute reduction in inconsistent systems. In fact, only two of them, possible reduct and l-decision reduct, are essential because the others are just equivalent to one of them, respectively. In addition, some other reduction methods based on Pawlak’s rough sets were also proposed in [7,44]. It should be noted that all the above mentioned reductions are performed under the framework of Pawlak’s rough set theory. In other words, these attribute reductions are all based on equivalence relations. However, partition or equivalence relation, as the indiscernibility relation in Pawlak’s rough set theory, is still restrictive for many applications. For example, incomplete information systems [18,19] and real-valued information systems [15,16] can not be handled with Pawlak’s rough sets. So several generalizations were proposed to solve these problems. One approach is to relax the partition to a cover. In [3–5,22,33–35,42,51,53,54], the concept of a cover of a universe was proposed to construct the upper and lower approximations of an arbitrary set. In [51], Zakowski mainly studied the structures of covers; while in [22] Mordeson examined the relationship between upper and lower approximations and some axioms satisfied by Pawlak’s rough sets. In [54], Zhu and Wang claimed that they had studied the reduction of covering generalized rough sets, but their reduction does not coincide with the original idea of reduction to reduce the useless attributes in an information system or a decision system, their reduction of covering generalized rough sets is just to reduce the ‘‘redundant” members in a cover and find the ‘‘smallest” cover that induces the same covering lower and upper approximations. Another important approach is to relax the equivalence relations [9–20,40,43,46–48]. Kryszkiewicz defined similarity relations in incomplete information systems and proposed a type of attribute reduction that only eliminates the information which is not essential from the viewpoint of classification or decision making [18]. Chen et al. suggested that an equivalence relation should be replaced by a fuzzy similarity relation and crisp rough sets could be generalized to fuzzy rough sets. They presented new definitions of lower and upper approximations on a complete completely distributive lattice, and proposed a unified framework which can both improve crisp generalizations of upper approximation and put the crisp and fuzzy generalizations of rough sets into the same framework [9]. Hu and Yu et al. extended Shannon’s entropy to measure the information quantity in a set of fuzzy sets[13] and applied the proposed measure to calculate the uncertainty in fuzzy approximation space[14] and reduce mixed data [15,16], where numerical attributes induce fuzzy equivalence relation. Greco et al. introduced the rough approximation based on preference relations and proposed a rough set methodology to analyze multi-criteria choice and ranking decision problems [11]. Slowinski and Vanderpooten proposed the substitution of an equivalence relation with a general relation where only reflexivity is required [43]. In [46,47], Yao investigated rough sets based on general binary relations, but his studies are mainly concentrated on the constructive and axiomatic approaches of approximation operators. In comparison with the study of attribute reduction using Pawlak’s rough sets, not much effort has so far been directed to the study on the attribute reduction based on general binary relations. Since attribute reduction is an important issue in rough set theory and has been applied to solve many problems, a thorough study on this topic is of both theoretical and practical importance. In this paper, we define a relation information system, a consistent relation decision system and a relation decision system, study attribute reduction based on these system, and find the methods proposed to deal with the attribute reduction based on general binary relations are applicable in classical systems as well, i.e., these

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2239

methods are natural generalizations of the reduction methods in Pawlak’s rough set theory [41]. Finally, the experimental results show that the proposed reduction methods are an effective technique to deal with complex data sets. The rest of the paper is organized as follows. Section 2 presents the fundamentals of Pawlak’s rough sets. Section 3 reviews some basic notions of rough sets based on general binary relations. In Section 4, we discuss attribute reduction in a relation information system. In Section 5, the concept of a relation decision system is introduced. In Section 6, we study attribute reduction in a consistent relation decision system. In Section 7, we study attribute reduction in a relation decision system. In Section 8, we show some experiments on six public data sets. Section 9 presents conclusions. 2. Basic notions related to information systems and rough sets The following basic concepts about Pawlak’s rough sets can be found in [27,41]. An information system is a pair A = (U, A), where U = {x1, . . . , xn} is a nonempty finite set of objects and A = {a1, a2, . . . , am} is a nonempty finite set of attributes. With every subset of attributes B  A we associate a binary relation IND(B), called B-indiscernibility relation, and defined as IND(B) = {(x, y) 2 U  U:a(x) = a(y), "a 2 B}. IND(B) is Obviously an equivalence relation and IND(B) = \ a 2BIND({a}). By [x]B we denote the equivalence class of IND(B) including x. For any subset X  U, B X = {x 2 U:[x]B  X} and BX ¼ fx 2 U : ½xB \ X 6¼ ;g are called B-lower and B-upper approximations of X in A, respectively. By M(A) we denote a n  n matrix (cij), called the discernibility matrix of A, such that cij = {a 2 A:a(xi) 6¼ a(xj)} for i, j = 1, 2, . . . , n. A discernibility function f(A) for an information system A = (U, A) is a Boolean function of m Boolean variables a1 ; . . . ; am corresponding to the attributes a1, . . . , am, respectively, and defined as f ðAÞða1 ; . . . ; am Þ ¼ ^f_ðcij Þ : 1 6 j < i 6 ng where _(cij) is the disjunction of all variables a¯ such that a 2 cij. An attribute a 2 B  A is superfluous in B if IND(B) = IND(B  {a}), otherwise a is indispensable in B. The collection of all indispensable attributes in A is called the core of A. We say that B  A is independent in A if every attribute in B is indispensable in B. B  A is called a reduct in A if B is independent and IND(B) = IND(A). The set of all the reducts in A is denoted as RED(A). Let g(A) be the reduced disjunctive form of f(A) obtained from f(A) by applying the multiplication and absorption laws, then there exist l and Xk  A for k = 1, . . . , l such that g(A) = (^X1) _    _ (^Xl) where each element in Xk appears only one time. We have RED(A) = {X1, . . . , Xl}. A decision system is a pair A* = (U, A [ {a*}), where a* is the decision attribute, A is condition attribute set. We say a 2 B  A is relatively dispensable in B if POSB(a*) = POSB{a}(a*), otherwise a is said to be relatively indispensable in B, where POSB(a*) is the union of B-lower approximation of all the equivalence classes induced by a*, i.e., POS B ða Þ ¼ [X 2U =a BX . If every attribute in B is relatively indispensable in B, we say that B  A is relatively independent in A*. B  A is called a relative reduct in A* if B is relatively independent in A* and POS* * * B(a ) = POSA(a ). The collection of all relatively indispensable attributes in A is called the relative core of A . * * Suppose M(A ) = (cij). We denote a matrix M(A ) = (cij) in the following way: (1) cij = cij  {a*}, if (a* 2 cij and xi, xj 2 POSA(a*)) or pos(xi) 6¼ pos(xj); (2) cij = ;, otherwise. Here pos:U ? {0, 1} is defined as pos(x) = 1 if and only if x 2 POSA(a*). All the relative reducts can be computed in an analogous way as reducts of M(A). 3. Rough sets based on general binary relations This section mainly reviews some basic notions of rough sets based on general binary relations and some statements to be used in the following sections. In the following discussion, the universe of discourse U is always considered to be finite and nonempty.

2240

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Definition 3.1 ([45,47]). Let U be a universe of discourse, R  U  U a binary relation on U. The relation R is said to be serial if there exists y 2 U such that (x, y) 2 R for all x 2 U; R is said to be reflexive if (x, x) 2 R for all x 2 U; R is said to be symmetric if for all x, y 2 U, (x, y) 2 R implies (y, x) 2 R; R is said to be transitive if for all x, y, z 2 U, (x, y) 2 R and (y, z) 2 R imply (x, z) 2 R. Definition 3.2 [42]. Let U be a universe of discourse, R  U  U a binary relation on U. Rs:U ? P(U) is a setvalued function, where Rs(x) = {y 2 U:(x, y) 2 R}, x 2 U. Rs(x) is referred to as the successor neighborhood of x with respect to R. Obviously, the relation R and its corresponding successor neighborhood Rs(x) are uniquely determined by each other, namely, xRy () ðx; yÞ 2 R () y 2 Rs ðxÞ. The pair (U, R) is referred to as a generalized approximation space. For any set X  U, a pair of lower and upper approximations of X are defined as apr0 R X ¼ fx : Rs ðxÞ  X g;

apr0 R X ¼ fx : Rs ðxÞ \ X 6¼ ;g;

ð1Þ

However, another pair of lower and upper approximations of X can be defined as aprR X ¼ [fRs ðxÞ : Rs ðxÞ  X g;

aprR X ¼ [fRs ðxÞ : Rs ðxÞ \ X 6¼ ;g:

ð2Þ

Formulas (1) and (2) are equivalent if and only if R is an equivalence relation. The following theorem can be easily derived from Definition 3.2. Theorem 3.1 [47]. Let R and S be binary relations on U, then we have the following properties: (1) R  S () 8x 2 U ; Rs ðxÞ  S s ðxÞ; (2) (R \ S)s(x) = Rs(x) \ Ss(x); (3) (R [ S)s(x) = Rs(x) [ Ss(x). If R is an equivalence relation on U, then (U, R) is Pawlak approximation space, and Rs(x) is the equivalence class containing x. Suppose R = {R1, R2, . . . , Rn} is a family of binary relations on U. In the subsequent discussion, we denote Int R ¼ \ni¼1 Ri . Corollary 3.1 ðInt RÞs ðxÞ ¼ \ni¼1 ðRi Þs ðxÞ. Remark 1. In Phd Thesis [24], five types of combination are used for defining five classes of tolerance relations. In this paper, we introduce the intersection for combining neighborhoods. 4. Attribute reduction in relation information systems An attribute in a data set may induce a general binary relation on the universe other than an equivalence relation due to missing data. In the following discussion, we denote an attribute by a binary relation. As we know, an equivalence relation is a key and primitive notion in Pawlak’s rough set theory. Let R be a family of equivalence relations. For any P  R, IND(P) is also an equivalence relation. Suppose (U, R) is an information system. P  R is a reduct of R if and only if P is a minimal subset of R satisfying IND(P) = IND(R). Skowron [41] has presented how to obtain a reduct of such an information system. If all equivalence relations are replaced by arbitrary binary relations, how can we get a minimal subset P  R which satisfies Int R = Int P? On the other hand, if we can deal with attribute reduction with rough sets based on general binary relation, the following problems which Pawlak’s rough sets have difficulty to deal with can be easily solved. (1) Attribute reduction of single-structural data set. All relations induced by attributes have the same properties in this kind of data set. These relations have one or two kinds of properties (reflexivity, symmetry and transitivity). For example, similarity relations [15,18,43], preference relations [11] and tolerance relations [24,33,34,42] belong to this case.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2241

(2) Attribute reduction of hybrid-structural data set. There coexist some heterogeneous attribute variables in this kind of data set, such as symbolic and realvalued [12]. Symbolic variables induce equivalence relations [27–32] while real-valued variables induce similarity, preference, or tolerance relations [11,12,15,18,33,34,42,43]. Example 4.1 Table 1 shows data set play tennis with mixed categorical and numerical attributes, where U = {x1, x2, . . . , x14}, A = {outlook, T, humidity, windy}, D = {play}, and condition attributes outlook and windy are categorical; T and humidity are numerical and play is the decision. According to the classical rough sets, attributes T and humidity should be discretized before rough set analysis is performed [26,40,49]. Here we directly analyze it with rough sets based on general binary relations. Attributes outlook and windy are symbolic variables while attributes T and humidity are real-valued variables. Outlook and windy induce equivalence relations; T and humidity induce similarity or preference relations according to different requirements. It can be seen from observations that a thorough study on attribute reduction based on general binary relations is of both theoretical and practical importance. In this section we first introduce the concept of a relation information system. Then, we propose some theorems to characterize the attribute reduction in a relation information system. Finally, we present an approach to compute all the reducts of the system based on discernibility matrix. Definition 4.1 Let U be a universe and R = {R1, R2, . . . , Rn} a family of general binary relations on U. Then (U, R) is called a relation information system; R is called a conditional relation (attribute) set. For arbitrary x 2 U, if R = {R1, R2, . . . , Rn} is a family of reflexive binary relations on U, then (Int R)s(x) is also reflexive, and the family {(Int R)s(x):x 2 U} forms a cover of the universe, i.e., [{(Int R)s(x):x 2 U} = U. If R is a family of equivalence relations, then (Int R)s(x) is the equivalence class containing x, and the family {(Int R)s(x):x 2 U} forms a partition of the universe. Therefore, relation information systems are the extensions of information systems in Pawlak’s rough set theory. For such a generalized information system, we always suppose [{(Int R)s(x):x 2 U} 6¼ ;, i.e., there exists at least one object x 2 U such that (Int R)s(x) 6¼ ;. Definition 4.2 Let (U, R) be a relation information system and Ri 2 R. Ri is called superfluous in R if Int R = Int(R  Ri); otherwise, Ri is called indispensable in R. For any subset P  R, P is called a reduct of R if each element in P is indispensable in P and Int R = Int P. The collection of all indispensable elements in R is called the core of R, denoted as Core(R).

Table 1 Play tennis data with heterogeneous attributes Day

Outlook

T

Humidity

Windy

Play

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14

Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy

85 80 83 70 68 65 64 72 69 75 75 72 81 71

85 90 86 96 80 70 65 95 70 80 70 90 75 91

False True False False False True True False False False True True False True

No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

2242

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Definitions 4.1 and 4.2 are the natural extensions of corresponding concepts in Pawlak’s rough set theory by substituting equivalence relations with general binary relations. It can be easily seen from the two definitions that the purpose of reduction of R is to find a minimal subset of R that keeps the relation Int R invariant. In the following, we study the attribute reduction in a relation information system. Proposition 4.1. Int R ¼ Int P () ðInt RÞs ðxÞ ¼ ðInt PÞs ðxÞ; 8x 2 U . Proposition 4.1 presents an equivalence condition to judge whether two relations are equal. Theorem 4.1 Let R = {R1, R2, . . . , Rn} be a family of binary relations on U, Ri 2 R, x 2 U. Then (Int R)s(x) 6¼ (Int{R  {Ri}})s(x) if and only if there exists y 2 U such that y 62 (Int R)s(x) ) y 2 (Int{R  {Ri}})s(x). Proof. Straightforward.

h

Theorem 4.1 implies that an indispensable relation can be characterized by the relationship between two objects in the universe according to Proposition 4.1 and Definition 4.2. The following Theorems 4.2 and 4.3 show that the original relationship between any two objects is invariant if superfluous attributes (relations) are removed. Theorem 4.2. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. For "x ,y 2 U, y 62 (Int R)s(x) if and only if there is at least a binary relation Ri 2 R such that y 62 (Ri)s(x). Proof. Straightforward.

h

Theorem 4.3 (Judgment theorem of attribute reduction). Let R = {R1, R2, . . . , Rn} be a family of binary relations on U, P  R. Then Int R = Int P if and only if for "x, y 2 U, if y 62 (Int R)s(x), then y 62 (Int P)s(x). Proof. Straightforward.

h

It is easily seen from Theorems 4.2 and 4.3 that attribute reduction of a relation information system is substantially equivalent to finding out the minimal subsets of conditional relation set R to keep invariant the successor neighborhood of an arbitrary object in U with respect to Int R, i.e., to keep invariant the original relationship between two arbitrary objects. Next, we define the discernibility matrix of a relation information system by Theorems 4.2 and 4.3 as follows. Definition 4.3. Let (U, R) be a relation information system. Suppose U = {x1, x2 . . . , xn}, we denote a n  n matrix (cij) by M(U, R), which is called the discernibility matrix of (U, R), and defined as cij = {R 2 R:xj 62 Rs(xi)} for xi, xj 2 U. Different from discernibility matrix based on equivalence relations, in our method one has to compute the successor neighborhood Rs(x) of every object xi 2 U and should examine xj 2 Rs(xi) for any R 2 R for construction of cij. Hu pointed out that successive neighborhoods can be obtained with a linear time o(n) [15], while the complexity of discernibility matrix is o(n2). Therefore, the time complexity of the proposed method is o(N  n2), where n and N are the numbers of samples and attributes, respectively. The following theorem is used to study the properties of the discernibility matrix. Theorem 4.4. Let M(U, R) = (cij) be the discernibility matrix of (U, R) and R = {R1, R2, . . . , Rl}. Then the following statements hold: (1) (2) (3) (4)

R 2 cij () xj 62 Rs ðxi Þ; cii ¼ ;; i 6 1; 2; . . . ; n () for 8R 2 R; R is reflexive; For all k 6 l, Rk is symmetric if and only if cij = cji (i 6¼ j, i, j 6 1, 2, . . . , n); For all k 6 l, Rk is transitive if and only if cij  cit [ ctj (i 6¼ j, i, j, t 6 1, 2, . . . , n).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2243

Proof (1) Straightforward. (2)  For "R 2 R, since R is reflexive, it follows that (xi, xi) 2 R for any xi 2 U, which implies xi 2 Rs(xi). Thus R 62 cii. So cii = ;. ) If cii = ;, i 6 1, 2, . . . , n, then xi 2 Rs(xi) for any R 2 R. Hence R is reflexive. (3) ) Suppose Rk 2 cij, (cij 6¼ ;), then xj 62 (Rk)s(xi) ) (xi, xj) 62 Rk ) (xj, xi) 62 Rk ) xi 62 (Rk)s(xj) ) Rk 2 cji ) cij  cji. Similarly cij  cji. So cij = cji.  We denote R1 = {R:R 2 R,"xi,xj 2 U, (xi, xj) 2 R},R2 = R  R1. For each R 2 R1, R is reflexive, symmetric and transitive. For "R 2 R2, there must be xi0 ; xj0 2 U ði0 6¼ j0 Þ such that ðxi0 ; xj0 Þ 62 R. Hence xj0 62 Rs ðxi0 Þ, which implies R 2 ci0 j0 . Since ci0 j0 ¼ cj0 i0 , we have R 2 cj0 i0 . Thus xi0 62 Rs ðxj0 Þ, which implies ðxj0 ; xi0 Þ 62 R. So R is symmetric. (4) ) Suppose that R0k ðk 6 1; 2; . . . ; lÞ is transitive, i.e., ðxi ; xt Þ 2 R0k and ðxt ; xj Þ 2 R0k ) ðxi ; xj Þ 2 R0k . This is equivalent to xt 2 ðR0k Þs ðxi Þ and xj 2 ðR0k Þðxt Þ ) xj 2 ðR0k Þs ðxi Þ, which implies that R0k 62 cit and R0k 62 ctj ) R0k 62 cij . So if Rk 2 cij, then Rk 2 cit or Rk 2 ctj, namely cij  cit [ ctj.  By (3) above, for each R 2 R1, R is symmetric and transitive. Suppose R 2 R2, there must exist xi0 ; xj0 2 U ði0 6¼ j0 Þ such that ðxi0 ; xj0 Þ 62 R, which implies xj0 62 Rs ðxi0 Þ. Hence R 2 ci0 j0 . Since ci0 j0  ci0 t [ ctj0 , R 2 ci0 t or R 2 ctj0 holds. Thus xj0 62 Rs ðxi0 Þ ) xt 62 Rs ðxi0 Þ or xj0 62 Rs ðxt Þ. Namely ðxi0 ; xj0 Þ 62 R ) ðxi0 ; xt Þ 62 R, or ðxt ; xj0 Þ 62 R. So if ðx0i0 ; x0t Þ 2 R and ðx0t ; x0j0 Þ 2 R, then ðx0i0 ; x0j0 Þ 2 R. Hence R is transitive. h Theorem 4.4 illustrates that the special properties of discernibility matrix in a relation information system are determined by special binary relations. If R is a family of equivalence relations, M(U, R) is the discernibility matrix of the corresponding information system in Pawlak’s rough set theory [41]. In this sense, the reduction method proposed for a relation information system is a generalization of the corresponding reduction method in Pawlak’s rough set theory [41]. The following theorem is employed to characterize the core of a relation information system. Theorem 4.5. Core(R) = {R 2 R:cij = {R}, i, j 6 n}. Proof. Suppose R 2 Core(R), then Int(R  {R}) 6¼ Int R. By Theorem 4.1, it follows that there exist xi, xj 2 U such that xj 62 (Int R)s(xi), but xj 2 (Int{R  {R}})s(xi). Obviously, there is only R 2 R satisfying xj 62 Rs(xi). By Definition 4.3, cij = {R}. Hence Core(R)  {R 2 R:cij = {R}, i, j 6 n}. Conversely, if cij = {R} for xi, xj 2 U, then R 2 Core(R) by Theorem 4.1. Hence Core(R) {R 2 R:cij = {R}, i, j 6 n}. Therefore Core(R) = {R 2 R:cij = {R}, i, j 6 n}. h Theorem 4.6. Let P  R, then Int(P) = Int(R) if and only if P \ cij 6¼ ; for any cij 6¼ ;.

Proof. ) Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, for "R 2 P we have xj0 2 Rs ðxi0 Þ, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since Int(P) = Int(R), by Theorem 4.3 we have xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R, which implies ci0 j0 ¼ ;. This is a contradiction to the assumption. So P \ cij 6¼ ; for any cij 6¼ ;.  If P \ cij 6¼ ; for xi, xj 2 U satisfying cij 6¼ ;, we suppose R 2 P \ cij, then we have xj 62 Rs(xi) ) xj 62 (Int R)s(xi) ) xj 62 (Int P)s(xi). By Theorem 4.3, Int(P) = Int(R) holds. h By Definition 4.2 and Theorem 4.6 we immediately get the following corollary. Corollary 4.1 Let P  R, then P is a reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ; (cij 6¼ ;, i, j 6 n).

2244

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Definition 4.4. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R) is a function defined on (U, R), the corresponding Boolean variable Ri ði 6 nÞ is defined for each binary relation Ri 2 R, and f(U, R) is defined as f ðU ; RÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg; ði; j 6 n; cij 6¼ ;Þ. Then f(U, R) is a Boolean function of (U, R), called a discernibility function or discernibility formula of (U, R), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a relation information system. Theorem 4.7. Let (U, R) be a relation information system. M(U, R) = (cij:i, j 6 n) is the discernibility matrix of (U, R). The discernibility formula f(U, R) is defined as: f ðU ; RÞ ¼

n ^

ð_cij Þ; ðcij 6¼ ;Þ:

i;j¼1

. If f ðU ; RÞ ¼ _lk¼1 ð^Bk ÞðBk  RÞ is obtained from f(U, R) by applying the multiplication and absorption laws, which satisfies that every element in Bi only appears one time, then the set {Bk:k 6 l} is the collection of all reducts of R, i.e., Red(R) = {Bk:k 6 l}. Proof. For each k 6 1, . . . , l, we have ^Bk 6 _ cij. By the disjunction and conjunction laws, Bk \ cij 6¼ ; for any cij 6¼ ;. Since f ðU ; RÞ ¼ _lk¼1 ð^Bk Þ, it follows that for arbitrary Bk if we reduce an element R from Bk, let 0 0 l k1 l B0k ¼ Bk  fRg, then f ðU ;RÞ 6¼ _k1 r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ and f ðU ;RÞ < _r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ. 0 0 If we still have Bk \ cij 6¼ ; for each cij 6¼ ;(i, j 6 n), then ^Bk 6 _cij . This implies that f ðU ; RÞ P 0 0 l _k1 r¼1 ð^Br Þ _ ð^Bk Þ _ ð_r¼kþ1 Br Þ, which is a contradiction. Hence, there exists ci0 j0 6¼ ; such that Bk \ ci0 j0 ¼ ;, which implies that Bk is a reduct of (U, R). For any X 2 Red(R), we have X \ cij 6¼ ; for any cij 6¼ ;(i, j 6 n). Thus f(U, R) ^ (^X) = ^ (_cij) ^ (^X) = ^ X. If Bk  X 6¼ ; for each k, we can find Rk 2 Bk  X. By rewriting f ðU ; RÞ ¼ ð_lk¼1 Rk Þ ^ U, we have ^X 6 _lk¼1 Rk . So there must be Rk0 such that ^X 6 Rk0 , this implies Rk0 2 X. This is a contradiction. So Bk0  X for some k0. Since both X and Bk0 are reducts, we have X ¼ Bk0 . Hence RED(R) = {B1,. . ., Bl}. h The following example is employed to illustrate our idea in this section. Example 4.2. Let (U, R) be a relation information system, where U = {x1, x2, . . . , x9}, R = {R1, R2, R3, R4}, R1 ¼ fðx1 ; x4 Þ; ðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x4 Þ; ðx4 ; x1 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 x6 Þ; ðx5 ; x2 Þ; ðx5 ; x6 Þ; ðx5 ; x5 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R2 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x7 Þ; ðx2 ; x4 Þ; ðx3 ; x2 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx5 ; x6 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R3 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x7 Þ; ðx2 ; x8 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx5 ; x2 Þ; ðx5 ; x3 Þ; ðx5 ; x5 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx7 ; x6 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; R4 ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx2 ; x4 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x5 Þ; ðx5 ; x6 Þ; ðx5 ; x8 Þðx6 ; x3 Þ; ðx6 ; x6 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; Int R ¼ fðx2 ; x1 Þ; ðx2 ; x3 Þ; ðx4 ; x2 Þ; ðx4 ; x5 Þ; ðx5 ; x2 Þ; ðx6 ; x3 Þ; ðx6 ; x7 Þ; ðx8 ; x4 Þ; ðx8 ; x8 Þ; ðx8 ; x9 Þg; ðInt RÞs ðx1 Þ ¼ ðInt RÞs ðx3 Þ ¼ ðInt RÞs ðx7 Þ ¼ ðInt RÞs ðx9 Þ ¼ /; ðInt RÞs ðx2 Þ ¼ fx1 ; x3 g; ðInt RÞs ðx4 Þ ¼ fx2 ; x5 g; ðInt RÞs ðx5 Þ ¼ fx2 g; ðInt RÞs ðx6 Þ ¼ fx3 ; x7 g; ðInt RÞs ðx8 Þ ¼ fx4 ; x8 ; x9 g:

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2245

The discernibility matrix of (U, R) is 2

ðxj Þ

R

6 ; 6 6 6 R 6 6 6 R 2 ; R3 ; R4 6 6 ðxi Þ6 R 6 6 6 R 6 6 6 R 6 6 R 4 R

3

R

R

R 2 ; R3 ; R4

R

R

R

R

R

;

R3

R

R

R1 ; R4

R1 ; R2 ; R 4

R1 ; R3 ; R4

R

R

R

R

R

R

;

R

R

;

R3

R

R

;

R1 ; R2 ; R 4

R 1 ; R3 ; R4

R2

R3

R

R1 ; R2 ; R 3

R

;

R

R

R1 ; R2 ; R3

;

R

R

R

R

R

R1 ; R2 ; R4

R

R

R

R

;

R

R

R

;

7 R7 7 R7 7 7 R7 7 7 R7 7 7 R7 7 7 R7 7 7 ;5

R

R

R

R

R

R

R

R

R

and f ðU ; RÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 9; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ^ R3 ^ ðR1 _ R4 Þ ^ ðR1 _ R2 _ R4 Þ ^ ðR1 _ R3 _ R4 Þ ^ R2 ^ ðR1 _ R2 _ R3 Þ ¼ R3 ^ ðR1 _ R4 Þ ^ R2 ¼ ðR1 ^ R2 ^ R3 Þ _ ðR2 ^ R3 ^ R4 Þ so Red(R) = {{R1, R2, R3},{R2, R3, R4}}, Core(R) = {R2, R3}. 5. A relation decision system and its properties Although different relation-based rough set models may have different definitions of approximations, the forms of definition of lower approximation are the same as formula (1) in Section 3. For example, similarity relation-based rough sets [18,43], preference relation-based rough sets [11] and tolerance relation-based rough sets [33,34,42] can be considered as belonging to this case. As to upper approximation, we omit discussion on it because it is not irrelevant to our work. As mentioned in Section 3, for a given generalized approximation space, there may be two forms of definition of lower approximation. It raises a problem: is there a solid relationship between the two definition forms of lower approximation? If there is, how do we construct the relationship? To obtain the relationship between the two forms of definition of lower approximation, the concept of a relation decision system is first introduced by generalizing a decision system in Pawlak’ rough set theory. Some related classical concepts, such as lower approximation operators, positive and negative domains in the generalized approximation space, are redefined. By these revisions of the classical concepts, the relationship between the two forms of definition of lower approximation and some useful results for a relation decision system are then derived. Let us first introduce the concept of a relation decision system. Definition 5.1 Let U be a universe of discourse, R = {R1, R2, . . . , Rn} a family of general binary relations on U, D a decision equivalence relation on U, and U/D the decision partition relative to R on U, then (U, R, D) is called a relation decision system, R is called a condition relation (attribute) set. Let d(x) be a decision function from U to the value set Vd, i.e., d:U ? Vd, defined as d(x) = d([x]D). From Definition 5.1, a relation decision system is such a system in which the family {(Int R)s(x):x 2 U} may not form a partition or a cover of U; while D generates a partition U/D on U. Of course, if R is a family of reflexive and transitive binary relations on U, the family {(Int R)s(x):x 2 U} forms a cover of U, and the relation decision system becomes a type of covering decision system [8]. If R is a family of equivalence relations on U, then R induces a partition of U, and the relation decision system becomes a decision system in Pawlak’s rough set theory [41]. Therefore, relation decision systems are naturally the extensions of decision systems in Pawlak’s rough set theory.

2246

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Definition 5.2. Let (U, R, D) be a relation decision system. For any subset X  U, we redefine the lower approximation of X in two ways: R0 ðX Þ ¼ fx 2 U : ðIntRÞs ðxÞ 6¼ ;; ðInt RÞs ðxÞ  X g; RðX Þ ¼ [fðIntRÞs ðxÞ : ðIntRÞs ðxÞ 6¼ ;; ðInt RÞs ðxÞ  X g: Correspondingly, the positive domains of D relative to R are defined as follows, respectively: POS 0R ðDÞ ¼ [X 2U =D R0 ðX Þ, POSR(D) = [ X2U/D R(X). Let P  R, we redefine the lower approximations of X with respect to P as follows: P0 (X) = {x 2 R0 (X):(Int P)s(x) 6¼ ;, (Int P)s(x)  X}, P(X) = [ {(Int P)s(x):(Int P)s(x) 6¼ ;, (Int P)s(x)  X}. Correspondingly, the positive domains of D relative to P are defined as follows, respectively: POS 0P ðDÞ ¼ [X 2U =D P0 ðRðX ÞÞ, POSP(D) = [ X 2 U/DP(R(X)). In order to identify these concepts, we say that R(X), P(X) and POSP(D) are the support domains (or the successor neighborhoods) of R0 (X),P0 (X) and POS 0P ðDÞ, respectively. We define another two notions, called the null and negative domains of D relative to R respectively, and denoted as NulR(D) and NegR(D) respectively: NulR ðDÞ ¼ fx 2 U : ðInt RÞs ðxÞ ¼ ;g;

NegR ðDÞ ¼ U  POS 0R ðDÞ  NulR ðDÞ:

By Definition 5.2, the null, negative, and positive domains of D relative to R divide the universe U into three pairwise-disjoint parts. Similarly, these domains of D relative to P divide the universe as well. If R is a family of equivalence relations on U, then the concepts of lower approximation and positive domain in Definition 5.2 become the corresponding concepts in Pawlak’s approximation space respectively, and the two forms of definition about each concept in Definition 5.2 are equivalent. Remark 1. For any subset X  U, according to the classical definitions of the lower approximations of X with respect to R and P in Pawlak’s rough set theory [27], Proposition aprP(X)  aprR(X) is always true. In order to keep this relationship invariant for rough sets based on general binary relations, we redefine P0 (X) as mentioned above. For a similar reason, we redefine POS 0P ðDÞ and POSP(D) as mentioned above. Unless stated, if the concepts, such as lower approximations, positive and negative domains, appear in the following sections, they mean what they are defined in this section. The following Propositions 5.1 and 5.2 can be derived from Definition 5.2. Proposition 5.1. Suppose that (U, R, D) is a relation decision system, P  R. Then it has the following properties: (1) (3) (5) (7)

R(X)  X, for "X  U; R(R(X)) = R(X), for " X  U; POSR(D) = [ X 2 U/DR(R(X)); POS 0R ðDÞ POS 0P ðDÞ;

Proof. Straightforward.

(2) (4) (6) (8)

R0 (R(X)) = R0 (X), for "X  U; POS 0R ðDÞ ¼ [X 2U =D R0 ðRðX ÞÞ; P0 (X)  R0 (X), for "X  U; POSR(D) POSP(D).

h

In general, the implications of positive domain and support domain are different as defined above. However, if R is a family of equivalence relations, then R0 (X) = R(X) for any subset X  U, which implies that (2) and (3), (4) and (5), (7) and (8) are equivalent. Proposition 5.2. Suppose that (U, R, D) is a relation decision system, P  R. Then the following statements hold: (1) (2) (3) (4)

POS 0R ðDÞ ¼ POS 0P ðDÞ () R0 ðRðX ÞÞ ¼ P0 ðRðX ÞÞ; 8X 2 U =D; POS R ðDÞ ¼ POS P ðDÞ () RðRðX ÞÞ ¼ PðRðX ÞÞ; 8X 2 U =D; R0 (R(X)) = P0 (R(X)) ) R(R(X)) = P(R(X)), "X  U; POS 0R ðDÞ ¼ POS 0P ðDÞ ) POS R ðDÞ ¼ POS P ðDÞ.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof. Straightforward.

2247

h

Remark 2. By (3) and (4) of Proposition 5.2, we construct the relationship between the two forms of definition of lower approximation and the relationship between the positive domain and its support domain, respectively. Therefore, we define relative reduction as Definition 7.1 in Section 7, that is, our relative reduction not only keeps POS 0R ðDÞ invariant, but also keeps its support domain POSR(D) invariant. Of course, we may only consider to keep POS 0R ðDÞ invariant, no matter how its support domain POSR(D) changes. In this case, we only replace R(X) with X in the definition of POS 0P ðDÞ. But this reduction method is relatively simple; we only consider the previous case in this paper. 6. Attribute reduction in consistent relation decision systems In this section, we study attribute reduction in a consistent relation decision system. With the following discussion, we can find that consistent decision systems in Pawlak’s rough set theory are the special cases of consistent relation decision systems, and the proposed reduction method for consistent relation decision systems is a generalization of the corresponding one in Pawlak’s rough set theory [41]. Let us start with introducing the concept of a consistent relation decision system. Definition 6.1 Let U be a universe, R = {R1, R2, . . . , Rn} a family of general binary relations on U, D a decision equivalence relation, U/D the decision partition on U, and (U, R, D) a relation decision system. If for "x 2 U, $Bj 2 U/D such that (Int R)s(x)  Bj, then (U, R, D) is called a consistent relation decision system, denoted as U/Int R 6 U/D. Otherwise, (U, R, D) is called an inconsistent relation decision system. Let U R ¼ [fðInt RÞs ðxÞ : x 2 U g, U R is called the decision object set of(U, R, D). From Definition 6.1, a relation decision system (U, R, D) is consistent if and only if NegR(D) = ;, and (U, R, D) is inconsistent if and only if NegR(D) 6¼ ;. For a given consistent relation decision system, if(Int R)s(x) = ;, then (Int R)s(x)  Bj is always true for "Bj 2 U/D. If each Ri 2 R is reflexive, then the family {(Int R)s(x):x 2 U} forms a cover of U, i.e., U R ¼ U . If each Ri 2 R is equivalent, then the family {(Int R)s(x):x 2 U} is a partition of U. Obviously, consistent decision systems in Pawlak’s rough set theory are the special cases of consistent relation decision systems. Let d(x) be a decision function from U to the value set Vd, i.e., d:U ? Vd. For "x 2 U, we have d(x) = d([x]D). For a consistent relation decision system, if (Int R)s(x) 6¼ ;, then $z 2 U such that d((Int R)s(x)) = d([z]D) = j 2 Vd; if(Int R)s(x) = ;, we make the convention: let d((Int R)s(x)) = 0 62 Vd, i.e., for " y 2 U,d((Int R)s(x)) 6¼ d(y) 2 Vd. Therefore, for " x, y 2 U, if d((Int R)s(x)) 6¼ d((Int R)s(y)), then (Int R)s(x) \ (Int R)s(y) = ;; but if(Int R)s(x) \ (Int R)s(y) = ;, then d((Int R)s(x)) = d((Int R)s(y)), or d((Int R)s(x)) 6¼ d((Int R)s(y)). If d((Int R)s(x)) = d((Int R)s(y)), then (Int R)s(x) \ (Int R)s(y) = ;, or (Int R)s(x) \ (Int R)s(y) 6¼ ;, or (Int R)s(x)  (Int R)s(y), or (Int R)s(x) (Int R)s(y). From Definition 6.1, the following proposition can easily be obtained. Proposition 6.1 Let U/Int R 6 U/D, y 2 U. Then y 62 U R () y 62 ðInt RÞs ðxÞ for 8x 2 U . Definition 6.2 Let (U, R, D) be a consistent relation decision system, Ri 2 R. Ri is called superfluous relative to D in R if Ri satisfies the following conditions: (1) U R ¼ U fRfRi gg ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int{R  {Ri}})s(x) = ;; if (Int R)s(x) 6¼ ;, then $z 2 U such that (Int{R  {Ri}})s(x)  [z]D. Otherwise, Ri is called indispensable relative to D in R. For any P  R, P is called a reduct of R relative to D if each element in P is indispensable in P, and for "x 2 U, (Int R)s(x) = ; implies (Int P)s(x) = ;, and U R ¼ U P . The collection of all indispensable elements relative to D in R is called the core of R relative to D, denoted as CoreD(R). From Definition 6.2, a relative reduct of a consistent relation decision system keeps both decision object set U R and decision rules of each object in U invariant.

2248

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The following theorem shows us that Definition 6.2 can be equivalently characterized by another two conditions. Theorem 6.1 The two conditions in Definition 6.2 are equivalent to the following two conditions: (10 ) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ; (20 ) For "x 2 U, if (Int R)s(x) = ;, then (Int{R  {Ri}})s(x) = ;, i.e., NulR ðDÞ ¼ NulfRRi g ðDÞ. Namely, if condition (1) in Definition 6.2 is substituted with (10 ), and condition (2) is substituted with (20 ), then the derived result is equivalent to Definition 6.2. Proof. Denote P = R  Ri. Since the system is consistent, by the definition of U R it follows that for 8ðInt RÞs ðxÞ  U R satisfying (Int R)s(x) 6¼ ;, there must exist X 2 U/D such that (Int R)s(x)  X. By the definition of R(X), (Int R)s(x)  R(X) holds. This implies U R  [X 2U =D RðX Þ. By the definition of R(X) and U R , we know RðX Þ  U R , which implies U R [X 2U =D RðX Þ. Hence U R ¼ [X 2U =D RðX Þ. Similarly, according to condition (2) of Definition 6.2, we also have U P ¼ [X 2U=D PðX Þ. Since U R ¼ U P , we have [X 2 U/DR(X) = [ X 2 U/ DP(X). Similar to the proof of (2) of Proposition 5.2, we have R(X) = P(X) for " X 2 U/D. According to condition (2) of Definition 6.2, ; 6¼ (Int R)s(x)  (Int P)s(x)  X for " x 2 R0 (R(X)). By the definition of P(X), (Int P)s(x)  P(X). Thus ; 6¼ (Int R)s(x)  (Int P)s(x)  P(X) = R(X)  X, which implies x 2 P0 (R(X)). Hence R0 (R(X))  P0 (R(X)). By (6) of Proposition 5.1, R0 (R(X)) P0 (R(X)). Thus R0 (R(X)) = P0 (R(X)). Therefore POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ, i.e., U R ¼ U fRfRi gg ) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ. Conversely, since POS 0R ðDÞ ¼ POS 0P ðDÞ, we have R0 (R(X)) = P0 (R(X)) for "X 2 U/D. For "x 2 R0 (R(X)), by the definition of P0 (R(X)), it follows that ; 6¼ (Int R)s(x)  (Int P)s(x)  R(X)  X. By R0 (R(X)) = R0 (X), [x2R0 ðX Þ ðInt RÞs ðxÞ  [x2R0 ðX Þ ðInt PÞs ðxÞ  RðX Þ holds. Since RðX Þ ¼ [x2R0 ðX Þ ðInt RÞs ðxÞ, we have RðX Þ ¼ [x2R0 ðX Þ ðInt PÞs ðxÞ. Since the system is consistent and (Int R)s ðxÞ ¼ ; () ðInt PÞs ðxÞ ¼ ;, U R ¼ [X 2U =D RðX Þ ¼ [X 2U =D [x2R0 ðX Þ ðInt PÞs ðxÞ ¼ [fðInt PÞs ðxÞ : ðInt PÞs ðxÞ 6¼ ;; x 2 U g ¼ U P . Therefore, we have U R ¼ U fRfRi gg () POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ. h By Theorem 6.1, if a relation decision system is consistent and P  R is a reduct of R relative to D, then U R ¼ U P () POS 0R ðDÞ ¼ POS 0P ðDÞ. By (4) of Proposition 5.2, U R ¼ U P ) POS R ðDÞ ¼ POS P ðDÞ holds, but the reverse is not necessarily true. Definition 6.3 Let (U, R, D) be a consistent relation decision system, P  R. P is called an equivalence subset of R relative to D if P satisfies the following conditions: (1) U R ¼ U P ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;; if (Int R)s(x) 6¼ ;, then $z 2 U such that (Int P)s(x)  [z]D. Theorem 6.2 Let U/Int R 6 U/D, Ri 2 R. Then Ri is indispensable if and only if there is at least a pair of x, y 2 U such that if x and y satisfy d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 2 (Int{R  {Ri}})s(x); if x and y satisfy d((Int R)s(x)) = d(y), then y 62 U R ) y 2 U fRfRi gg . Proof. ) Since Ri is indispensable, by Definition 6.2 we know that condition (1) or (2) in Definition 6.2 is not true. Suppose that the condition (2) is not true, then no matter whether condition (1) is true, there exists x 2 U such that if (Int R)s(x) = ;, then (Int R  {Ri})s(x) 6¼ ;, thus $y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R  {Ri}})s(x); At the same time d((Int R)s(x)) 6¼ d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R  {Ri}})s(x). If(Int R)s(x) 6¼ ;, then $z 2 U such that (Int R)s(x)  [z]D, but (Int{R  {Ri}})s(x)  [z]D is not true. This implies that $y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R  {Ri}})s(x) and y 62 [z]D. Hence d((Int R)s(x)) 6¼ d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R  {Ri}})s(x).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2249

Suppose that condition (2) is true and condition (1) is not true, then U R 6¼ U fRfRi gg , which implies that $y 2 U such that y 62 U R , but y 2 U fRfRi gg . Thus $x 2 U such that y 2 (Int{R  {Ri}})s(x). Since condition (2) is true, we have (Int R)s(x) 6¼ ; (if (Int R)s(x) = ;, then by Definition 6.2, (Int R  {Ri})s(x) = ;, which is a contradiction to y 2 (Int{R  {Ri}})s(x)) and $z 2 U such that (Int{R  {Ri}})s(x)  [z]D. Thus y 2 [z]D. This implies d((Int R)s(x)) = d((Int{R  {Ri}})s(x)) = d([z]D) = d(y). Namely, there exists a pair of x, y 2 U satisfying d((Int R)s(x)) = d(y) such that y 62 U R ) y 2 U fRfRi gg .  Suppose that there exists a pair of x, y 2 U satisfying d((Int R)s(x)) 6¼ d(y) such that y 62 (Int R)s(x) ) y 2 (Int{R  {Ri}})s(x), then (Int R)s(x) 6¼ (Int{R  {Ri}})s(x). If(Int R)s(x) = ;, by Definition 6.2 we know that Ri is indispensable in R. If (Int R)s(x) 6¼ ;, since U/Int R 6 U/D, it follows that there exists z 2 U such that (Int R)s(x)  [z]D. Assume that (Int{R  {Ri}})s(x)  [z]D, then y 2 [z]D. This implies d((Int R)s(x)) = d([z]D) = d(y), which is a contradiction. Thus (Int{R  {Ri}})s(x)  [z]D is not true. Hence Ri is indispensable in R. Suppose that there exists a pair of x, y 2 U satisfying d((Int R)s(x)) = d(y) such that y 62 U R ) y 2 U fRfRi gg . It is clear that U R 6¼ U fRfRi gg , hence Ri is indispensable in R. h Theorem 6.3 (Judgment theorem of attribute reduction). Let U/Int R 6 U/D, P  R. Then P is an equivalence subset of R relative to D if and only if for "x, y 2 U, if d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 62 (Int P)s(x); if d((Int R)s(x)) = d(y), then y 62 U R ) y 62 U P . Proof. ) Since P is an equivalence subset of R relative to D, it follows that for "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;. This implies that for "y 2 U we have 0 = d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x) ) y 62 (Int P)s(x). If (Int R)s(x) 6¼ ;, then $z 2 U such that (Int P)s(x)  [z]D. Since P  R, we have (Int R)s(x)  (Int P)s(x)  [z]D. For "y 2 U satisfying y 62 [z]D, we have d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x) ) y 62 (Int P)s(x). For "y 2 U satisfying y 2 [z]D, we have d((Int R)s(x)) = d(y). Since U R ¼ U P , it follows that if y 62 U R , then y 62 U P , i.e., y 62 U R ) y 62 U P .  Since U/Int R 6 U/D, it follows that for any x 2 U if (Int R)s(x) = ;, then for "y 2 U we have d((Int R)s(x)) 6¼ d(y) and y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x) = ;. If (Int R)s(x) 6¼ ;, then $z 2 U such that (Int R)s(x)  [z]D. For "y 2 U satisfying y 62 [z]D, namely d((Int R)s(x)) 6¼ d(y), we have y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x)  [z]D. For "y 2 U satisfying y 2 [z]D, namely d((Int R)s(x)) = d(y), since y 62 U R ) y 62 U P , we have U R U P . Since U R  U P is obviously true, we have U R ¼ U P . Altogether, the result is true. h From Theorem 6.3, it can be seen that a reduct of R relative to D is equivalent to a minimal subset of R that keeps invariant the decision object set U R and decision rules of each object in U. By Theorems 4.2 and 6.3, the discernibility matrix of a consistent relation decision system can be defined as follows. Definition 6.4 Let (U, R, D) be a consistent relation decision system and U = {x1, x2, . . . , xn}. By M(U, R, D) we denote a n  n matrix (cij), called the discernibility matrix of (U, R, D), and defined as for xi, xj 2 U, (1) If d((Int R)s(xi)) 6¼ d(xj), then cij = {R 2 R:xj 62 Rs(xi)}; (2) If (Int R)s(xi) = d(xj), then ( cij ¼

R; fR 2 R : xj 62 Rs ðxi Þg

xj 2 U R ; xj 62 U R :

Theorem 6.4 Let (U, R, D) be a consistent relation decision system. Then the following statements hold: (1) If xj 62 U R , then cij 6¼ ; for "xi 2 U; (2) CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}; (3) Let P  R, P \ cij 6¼ ; for all cij 6¼ ; if and only if P is an equivalence subset of R relative to D.

2250

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof (1) Since xj 62 U R , xj 62 (Int R)s(xi) for "xi 2 U, which implies that there is at least R 2 R such that xj 62 (Int R)s(xi). Hence R 2 cij, i.e., cij 6¼ ;. (2) ) Suppose R 2 CoreD(R), then R is indispensable in R. By Theorem 6.2, it follows that $xi, xj 2 U such that if xi, xj satisfy d((Int R)s(xi)) 6¼ d(xj), then we have xj 62 (Int R)s(xi) ) xj 2 (Int{R  {R}})s(xi). It is easily seen that R is the only relation in R satisfying xj 62 Rs(xi). Thus by Definition 6.4, cij = {R}. If xi, xj satisfy d((Int R)s(xi)) = d(xj), then by Theorem 6.2 we have xj 62 U R ) xj 2 U fRfRgg . This implies that there exists xk 2 U such that xj 62 (Int R)s(xi) ) xj 2 (Int{R  {R}})s(xk). Since there is only R 2 R satisfying xj 62 Rs(xk), by Definition 6.4 we have ckj = {R}. Therefore CoreD(R)  {R 2 R:cij = {R}, i, j 6 n}.  For xi, xj 2 U, suppose cij = {R}. If d((Int R)s(xi)) 6¼ d(xj), then xj 62 (Int R)s(xi) ) xj 2 (Int{R  {R}})s(xi) by Definition 6.4. By Theorem 6.2, R 2 Core(R). If d((Int R)s(xi)) = d(xj) and xj 62 U R , then by Definition 6.4 we have xj 62 Rs(xi). Thus xj 2 (Int{R  {R}})s(xi), namely xj 2 U fRfRgg . By Theorem 6.2, R 2 CoreD(R). Therefore CoreD(R) {R 2 R:cij = {R}, i, j 6 n}.Altogether CoreD(R) = {R 2 R:cij = {R}, i, j 6 n} (3)  If xi, xj 2 U satisfy d((Int R)s(xi)) = d(xj) and xj 2 U R , then by Definition 6.4, cij = R. Thus P \ cij 6¼ ;. In the following, we prove other cases. Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, we have xj0 2 Rs ðxi0 Þ for "R 2 P, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since P is an equivalence subset of R relative to D, by Theorem 6.3 if dððInt RÞs ðxi0 ÞÞ 6¼ dðxj0 Þ, we have xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. If dððInt RÞs ðxi0 ÞÞ ¼ dðxj0 Þ and xj0 62 U R , then we have xj0 62 U R ) xj0 62 U P . Thus xj0 62 ðInt PÞs ðxi Þ for "xi 2 U, especially xj0 62 ðInt PÞs ðxi0 Þ, which is a contradiction to xj0 2 ðInt PÞs ðxi0 Þ. Hence P \ cij 6¼ ; for any cij 6¼ ;. ) Since P \ cij 6¼ ; for any cij 6¼ ;, we assume that R 2 P \ cij. If d((Int R)s(xi)) 6¼ d(xj), then we have xj 62 Rs(xi) ) xj 62 (Int R)s(xi) ) xj 62 (Int P)s(xi). If d((Int R)s(xi)) = d(xj) and xj 62 U R , then xj 62 (Int R)s(x)"x 2 U. Assume that xj 62 (Int P)s(x) is not true, i.e., 9xi1 2 U such that xj 2 ðInt PÞs ðxi1 Þ, then we have xj 2 Rs ðxi1 Þ for "R 2 P, which implies R 62 ci1 j . Thus P \ ci1 j ¼ ;. Since xj 62 U R , by (1) we know ci1 j 6¼ ;. This implies a contradiction. Hence xj 62 (Int P)s(x) is true. This means xj 62 U P . By Theorem 6.3, we know that P is an equivalence subset of R relative to D. h By Definition 6.2 and (3) of Theorem 6.4 we immediately get the following corollary. Corollary 6.1 Let P  R, then P is a relative reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ;, (cij 6¼ ;, i, j 6 n). Definition 6.5 Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R, D) is a function defined on (U, R), the corresponding Boolean variable Ri ði 6 nÞ is defined for each binary relation Ri 2 R, and f(U, R, D) is defined as: f ðU ; R; DÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg : i; j 6 n; cij 6¼ ;g: Then f(U, R, D) is a Boolean function of (U, R, D), called a discernibility function or discernibility formula of (U, R, D), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a consistent relation decision system. Theorem 6.5 Let (U, R, D) be a consistent relation decision system. M(U, R,D) = (cij:i,j 6 n) is the discernibility matrix of (U, R,D). The discernibility formula f(U, R, D) is defined as f ðU ; R; DÞ ¼ ^ni;j¼1 ð_cij Þ; ðcij 6¼ ;Þ. If f ðU ; R; DÞ ¼ _lk¼1 ð^Bk ÞðBk  RÞ is obtained from f(U, R, D) by applying the multiplication and absorption laws,

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2251

which satisfies that each element in Bi appears only one time, then {Bk:k 6 l} is the collection of all the reducts of (U, R, D), i.e., Red(R) = {Bk:k 6 l}. Proof. The proof is similar to that of Theorem 4.7.

h

The following example is employed to illustrate our idea in this section. Example 6.1 Let (U, R, D) be a consistent relation decision system, where (U, R) is the relation information system in Example 4.1 and D is a decision equivalence relation on U. U =D ¼ ffx1 ; x3 ; x7 g; fx2 ; x5 ; x6 g; fx4 ; x8 ; x9 gg; U R ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x7 ; x8 ; x9 g: The discernibility matrix of (U, R, D) is as follows: 2

ðxj Þ

R R

6 6 6 6 R 6 6 6 R2 ; R3 ; R4 6 ðxi Þ 6 R 6 6 R 6 6 6 R 6 6 R 4 R

R R

R R

R2 ; R3 ; R4 R3

R R

R R

R R

R R1 ; R2 ; R4

R1 ; R3 ; R 4

R

R

R

R

R

R

R R

R R 1 ; R2 ; R4

R R1 ; R3 ; R4

R R

R3 R3

R R

R R1 ; R2 ; R3

R R

R R

R R

R R

R1 ; R 2 ; R3 R1 ; R 2 ; R4

R R

R R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

3 R R7 7 7 R7 7 7 R7 7 R7 7 7 R7 7 R7 7 7 R5 R

and f ðU ; RÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 9; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ^ R3 ^ ðR1 _ R2 _ R4 Þ ^ ðR1 _ R3 _ R4 Þ ^ ðR1 _ R2 _ R3 Þ ¼ R3 ^ ðR1 _ R2 _ R4 Þ ¼ ðR1 ^ R3 Þ _ ðR2 ^ R3 Þ _ ðR3 ^ R4 Þ: So Red(R) = {{R1, R3}, {R2, R3}, {R3, R4}}, Core(R) = {R3}. 7. Attribute reduction in relation decision systems As mentioned in the previous section, for a given relation decision system (U, R, D), if NegR(D) = ;, then (U, R,D) is a consistent relation decision system; otherwise, it is called an inconsistent relation decision system. Therefore, a consistent relation decision system is just a special case of relation decision system. This section is devoted to the investigation of attribute reduction in a relation decision system from the general point of view. Similar to the case of Pawlak’s rough sets [27,41], a relative reduct of a relation decision system is defined as a minimal subset of conditional relation (attribute) set R that keeps invariant the positive domain of the decision equivalence relation. In the following discussion, we always suppose that U is a finite universe, R = {R1, R2, . . . , Rn} is a family of binary relations on U, D is a decision equivalence relation relative to R on U, and U/D = {[x]D:x 2 U} is the decision partition. For a given relation decision system, we always suppose POS 0R ðDÞ 6¼ ;. By the results in the Section 5 and Theorem 6.1, we first define the concept of a relative reduct of a relation decision system. Definition 7.1 Let (U, R, D) be a relation decision system, Ri 2 R. Ri is called superfluous relative to D in R if Ri satisfies the following conditions: (1) POS 0R ðDÞ ¼ POS 0fRRi g ðDÞ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int{R  {Ri}})s(x) = ;, i.e., NulR ðDÞ ¼ NulfRRi g ðDÞ.

2252

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Otherwise, Ri is called indispensable relative to D in R. For any P  R, P is called a reduct of R relative to D if each element in P is indispensable relative to D in P, and for "x 2 U, (Int R)s(x) = ; implies (Int P)s(x) = ;, and POS 0R ðDÞ ¼ POS 0P ðDÞ. The collection of all indispensable elements relative to D in R is called the core of R relative to D, denoted as CoreD(R). By Definition 7.1 and (4) of Proposition 5.2, it can be easily seen that a reduct of R relative to D keeps both the positive domain POS 0R ðDÞ and its support domain POSR(D) invariant. Definition 7.2 Let (U, R, D) be a relation decision system, P  R. P is called an equivalence subset of R relative to D if P satisfies the following conditions: (1) POS 0R ðDÞ ¼ POS 0P ðDÞ; (2) For "x 2 U, if (Int R)s(x) = ;, then (Int P)s(x) = ;, i.e., NulR(D) = NulP(D). Theorem 7.1 Let (U, R, D) be a relation decision system, Ri 2 R. Then Ri is indispensable in R relative to D if and only if there is at least a pair of x, y 2 U such that if x 2 NulR (D), then y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) = d(y), then y 62 POSR(D) ) y 2 (Int(R  Ri))s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x). Proof. ) If Ri is indispensable, then by Definition 7.1 we know that condition (1) or (2) in Definition 7.1 is not true. Suppose that condition (2) is not true, then no matter whether condition (1) is true, there must exist x 2 NulR(D) such that (Int R)s(x) = ;, but (Int{R  {Ri}})s(x) 6¼ ;. Thus $y 2 U such that y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x). Suppose that condition (2) is true and condition (1) is not true, i.e., for "x 2 U, if (Int R)s(x) = ;, then (Int{R  {Ri}})s(x) = ;, but POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ. Thus $X0 2 U/D such that P0 (R(X0)) 6¼ R0 (R(X0)), which implies P0 (R(X0))  R0 (R(X0))(P = R  Ri). Hence, there must exist one object x 2 POS 0R ðDÞ such that (Int R)s(x)  R(X0), but (Int(R  Ri))s(x)  R(X0) is not true. So there exists y 2 U such that y 62 (Int R)s(x), but y 2 (Int{R  {Ri}})s(x) and y 62 R(X0). If d((Int R)s(x)) = d(y), then y 2 X0. Since y 62 R(X0), we have y 62 POSR(D). Hence, we have y 62 POSR(D) ) y 2 (Int(R  Ri))s(x). If d((Int R)s(x)) 6¼ d(y), obviously, we have y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x).  If x 2 NulR(D), then (Int R)s(x) = ;. Thus we have y 62 (Int R)s(x) for " y 2 U. Since y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x), by Definition 7.1 it can been easily seen that Ri is indispensable. If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x)  R(X0)  X0 and y 2 X0 = [y]D. Since y 62 POSR (D), we have y 62 R(X0). Since y 62 POSR(D) ) y 2 (Int(R  Ri))s(x), we know that (Int(R  Ri))s(x)  R(X0) is not true. Thus x 62 POS 0fRRi g ðDÞ, which implies that POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ, i.e., Ri is indispensable. If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then there exist X0 and X1(X0 6¼ X1) such that (Int R)s(x)  R(X0)  X0 and y 2 X1. Thus y 62 (Int R)s(x). Since y 62 (Int R)s(x) ) y 2 (Int(R  Ri))s(x), we know that (Int(R  Ri))s(x)  X0 is not true, which implies that (Int(R  Ri))s(x)  R(X0) is not true. Thus x 62 POS 0fRRi g ðDÞ. Hence POS 0R ðDÞ 6¼ POS 0fRRi g ðDÞ, i.e., Ri is indispensable. h Theorem 7.2 (Judgment theorem of attribute reduction). Let (U, R, D) be a relation decision system, P  R. Then P is an equivalence subset of R relative to D if and only if for "x, y 2 U, if x 2 NulR(D), then y 62 (Int R)s(x) ) y 62 (Int P)s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) = d(y), then y 62 POSR(D) ) y 62 (Int P)s(x); If x 2 POS 0R ðDÞ and d((Int R)s(x)) 6¼ d(y), then y 62 (Int R)s(x) ) y 62 (Int P)s(x). Proof. ) Since P is an equivalence subset of R relative to D, it follows that for "x 2 U, if x 2 NulR(D), i.e., (Int R)s(x) = ;, then (Int P)s(x) = ;. Thus for "y 2 U, y 62 (Int R)s(x) ) y 62 (Int P)s(x).

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2253

If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that x 2 R0 (R(X0)) and y 2 X0. Since y 62 POSR(D), we have y 62 R(X0). Since P is an equivalence subset of R relative to D, POS 0R ðDÞ ¼ POS 0P ðDÞ holds. By (1) of Proposition 5.2, we have R0 (R(X)) = P0 (R(X)) for "X 2 U/D. Especially R0 (R(X0)) = P0 (R(X0)). Thus x 2 P0 (R(X0)), which implies (Int P)s(x)  R(X0)  X0. Hence y 62 (Int P)s(x), i.e., y 62 POSR(D) ) y 62 (Int P)s(x). If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then there exist X0 and X1(X0 6¼ X1) such that x 2 R0 (R(X0)) and y 2 X1, which implies that y 62 (Int R)s(x) and y 62 X0. Since P is an equivalence subset of R relative to D, then POS 0R ðDÞ ¼ POS 0P ðDÞ. By (1) of Proposition 5.2, we have R0 (R(X)) = P0 (R(X)) for"X 2 U/D. Especially R0 (R(X0)) = P0 (R(X0)). Thus x 2 P0 (R(X0)). This implies (Int P)s(x)  R(X0)  X0. Hence y 62 (Int P)s(x), i.e., y 62 (Int R)s(x) ) y 62 (Int P)s(x).  (1) If x 2 NulR(D), then (Int R)s(x) = ;. Thus y 62 (Int R)s(x) for "y 2 U. Since y 62 (Int R)s(x) ) y 62 (Int P)s(x), we have (Int P)s(x) = ;. For x 2 POS 0R ðDÞ, there are the following two cases: (2) If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x)  R(X0)  X0 and y 2 X0 = [y]D. Since y 62 POSR (D), we have y 62 R(X0) and y 62 (Int R)s(x). Thus y 62 POSR(D) ) y 62 (Int P)s(x) is equivalent to y 62 R(X0) ) y 62 (Int P)s(x), which implies (Int(P))s(x)  R(X0). Hence x 2 P0 (R(X0)). By the definition of POS 0P ðDÞ; x 2 POS 0P ðDÞ. (3) If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, then $X0 2 U/D such that (Int R)s(x)  R(X0)  X0 and y 62 X0, which implies y 62 (Int R)s(x). Thus y 62 (Int R)s(x) ) y 62 (Int P)s(x) is equivalent to y 62 X0 ) y 62 (Int P)s(x). Hence (Int P)s(x)  X0. Assume that (Int P)s(x)  R(X0) is not true, then there must exist y0 2 X0 such that y0 62 R(X0) and y0 2 (Int P)s(x). This means d((Int R)s(x)) = d(y0) and y0 62 POSR(D). Thus we have x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ ¼ dðy 0 Þ, at the same time y0 62 POSR(D) ) y0 2 (Int P)s(x). This is a contradiction to case (2). Hence (Int P)s(x)  R(X0), which implies x 2 P0 (R(X0)). By the definition of POS 0P ðDÞ, we also have x 2 POS 0P ðDÞ. By (2) and (3), we can conclude that x 2 POS 0P ðDÞ for any x 2 POS 0R ðDÞ. This means POS 0R ðDÞ  POS 0P ðDÞ. By (7) of Proposition 5.1, we have POS 0R ðDÞ POS 0P ðDÞ. Therefore POS 0R ðDÞ ¼ POS 0P ðDÞ. Under the assurance of accomplishment of the above three cases, for "x 2 NegR(D), we have x 62 POS 0R ðDÞ () x 62 POS 0P ðDÞ. Altogether POS 0R ðDÞ ¼ POS 0P ðDÞ. So the result is true. h From Theorem 7.2, we know that P is an equivalence subset of R relative to D if and only if the null, negative, and positive domains of D relative to R and P are respectively and correspondingly equal. By Theorems 4.2 and 7.2, the discernibility matrix of a relation decision system can be defined as follows. Definition 7.3 Let (U, R, D) be a relation decision system, U = {x1, x2, . . . , xn}. By M(U, R, D) we denote a n  n matrix (cij), called the discernibility matrix of (U, R, D), defined as for xi, xj 2 U: (1) If xi 2 NulR(D), then cij = {R 2 R:xj 62 Rs(xi)}; (2) If xi 2 POS 0R ðDÞ and d((Int R)s(xi)) = d(xj), then ( R; xj 2 POS R ðDÞ; cij ¼ fR 2 R : xj 62 Rs ðxi Þg xj 62 POS R ðDÞ; (3) If xi 2 POS 0R ðDÞ and d((Int R)s(xi)) 6¼ d(xj), then cij = {R 2 R:xj 62 Rs(xi)}; (4) If xi 2 NegR(D), then cij = R.

Theorem 7.3 Let (U, R, D) be a relation decision system. Then the following statements hold: (1) CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}; (2) Let P  R, P is an equivalence subset of R relative to D if and only if P \ cij 6¼ ; for all cij 6¼ ;.

2254

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof (1) ) Suppose R 2 CoreD(R), then R is indispensable in R. By Theorem 7.1, it follows that there exist xi, xj 2 U such that if xi, xj satisfy xi 2 NulR(D), then xj 62 (Int R)s(xi) ) xj 2 (Int(R  R))s(xi). Obviously, there is only R satisfying xj 62 Rs(xi). Thus cij = {R} by Definition 7.3. If xi, xj satisfy xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, then xj 62 POSR(D) ) xj 2 (Int(R  Ri))s(x), which implies that xj 62 (Int R)s(xi), but xj 2 (Int{R  {R}})s(xi). Thus there is only R satisfying xj 62 Rs(xi). By Definition 7.3, cij = {R}. If xi, xj satisfy xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ 6¼ dðxj Þ, then xj 62 (Int R)s(xi) ) xj 2 (Int{R  {R}})s(xi). Similarly cij = {R}. Therefore CoreD(R)  {R 2 R:cij = {R}, i, j 6 n}.  For xi, xj 2 U, suppose cij = {R}. If xi 2 NulR(D), then by Definition 7.3, xj 62 (Int R)s(xi) ) xj 2 (Int{R  {R}})s(xi). By Theorem 7.1, R 2 Core(R). If xi 2 POS 0R ðDÞ; dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, and xj 62 POS 0R ðDÞ, then by Definition 7.3, xj 62 (Int R)s(xi), but xj 2 (Int{R  {R}})s(xi). By Theorem 7.1, R 2 Core(R). If x 2 POS 0R ðDÞ and dððInt RÞs ðxÞÞ 6¼ dðyÞ, similarly R 2 CoreD(R). Thus CoreD(R) {R 2 R:cij = {R}, i, j 6 n}. Altogether CoreD(R) = {R 2 R:cij = {R}, i, j 6 n}. (2)  If xi 2 POS 0R ðDÞ; dððInt RÞs ðxi ÞÞ ¼ dðxj Þ, xj 2 POSR(D); and if xi 2 NegR(D), then cij = R by Definition 7.3. Thus P \ cij 6¼ ;. In the following, we prove other cases: Assume that 9i0 ; j0 6 n; ci0 j0 6¼ ;, but P \ ci0 j0 ¼ ;. Therefore, by Definition 7.3 we have xj0 2 Rs ðxi0 Þ for "R 2 P, which implies xj0 2 ðInt PÞs ðxi0 Þ. Since P is an equivalence subset of R relative to D, by Theorem 7.2 it follows that if xi0 2 NulR ðDÞ, then xj0 2 ðInt PÞs ðxi0 Þ ) xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for " R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. If xi0 2 POS 0R ðDÞ; dððInt RÞs ðxi0 ÞÞ ¼ dðxj0 Þ; and xj0 62 POS R ðDÞ, then xj0 62 POS R ðDÞ ) xj0 62 ðInt PÞs ðxi0 Þ. Thus $R 2 P such that xj0 62 ðIntRÞs ðxi0 Þ, which implies R 2 ci0 j0 . Hence R 2 P \ ci0 j0 6¼ ;, which is a contradiction to the assumption. If xi0 2 POS 0R ðDÞ and dððInt RÞs ðxi0 ÞÞ 6¼ dðxj0 Þ, then xj0 2 ðInt PÞs ðxi0 Þ ) xj0 2 ðInt RÞs ðxi0 Þ. Thus xj0 2 Rs ðxi0 Þ for "R 2 R. Hence ci0 j0 ¼ ;, which is a contradiction to the assumption. Altogether P \ cij 6¼ ; for any cij 6¼ ;. ) Since P \ cij 6¼ ; for any cij 6¼ ;, we assume that R 2 P \ cij. If xi 2 NulR(D), then xj 62 Rs(xi) ) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi). If xi 2 POS 0R ðDÞ, d((Int R)s(xi)) = d(xj), and xj 62 POSR(D), then xj 62 Rs(xi) ) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi), i.e., xj 62 POSR(D) ) xj 62 (Int P)s(xi). If xi 2 POS 0R ðDÞ and dððInt RÞs ðxi ÞÞ 6¼ dðxj Þ, then xj 62 Rs(xi) D.

) xj 62 Int Rs(xi) ) xj 62 Int Ps(xi). By Theorem 7.2, we know that P is an equivalence subset of R relative to h By Definition 7.1 and (2) of Theorem 7.3 we immediately get the following corollary.

Corollary 7.1. Let P  R, then P is a relative reduct of R if and only if it is a minimal subset satisfying P \ cij 6¼ ;, (cij 6¼ ;, i, j 6 n). Definition 7.4. Let R = {R1, R2, . . . , Rn} be a family of binary relations on U. f(U, R, D) is a function defined on (U, R, D), the corresponding Boolean variable Ri ði 6 nÞ is defined for each binary relation Ri 2 R, and f(U, R, D) is defined as: f ðU ; R; DÞðR1 ; R2 ; . . . ; Rn Þ ¼ ^f_ðcij Þg : i; j 6 n; cij 6¼ ;g: Then f(U, R, D) is a Boolean function of(U, R, D), called a discernibility function or discernibility formula of(U, R, D), where _(cij) represents the disjunction operation among elements in cij. By the discernibility function, we have the following theorem to compute all the reducts of a relation decision system. Theorem 7.4. Let (U, R, D) be a relation decision system. M(U, R,D) = (cij:i,j 6 n) is the discernibility matrix of (U, R,D). The discernibility formula f(U, R, D) is defined as f ðU ; R; DÞ ¼ ^ni;j¼1 ð_cij Þ; ðcij 6¼ ;Þ. If f ðU ; R; DÞ ¼ _lk¼1 ð^Bk ÞðBk  RÞ is obtained from f(U, R, D) by applying the multiplication and absorption laws, which satisfies that each element in Bi appears only one time, then the set {Bk:k 6 l} is the collection of all the reducts of (U, R, D), i.e., Red(R) = {Bk:k 6 l}.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Proof. The proof is similar to that of Theorem 4.7.

2255

h

The following example is employed to illustrate our idea in this section. Example 7.1. In order to survey the influence on life level by the four factors: education, profession, diligence and temperament, we recruit 10 volunteers to participate in the test. Let U = {x1, x2, . . . , x10} be a set of ten volunteers. The values of education are {high, low}, the values of profession are {good, poor}, the values of diligence are {yes, no}, the values of temperament are {good, bad}. D denotes life level. According to the social life standard, the ten volunteers are described as follows. U =D ¼ ffx1 ; x5 ; x6 gðbetterÞ; fx2 ; x4 ; x7 gðgoodÞ; fx3 ; x8 gðgeneralÞ; fx9 ; x10 gðbadÞg: After these volunteers live together and know each other for a time, let them evaluate each other for each given attribute. Judgment results are written according to the following rules. Denote E(xi, xj) iff xj considers xi to be of high education, D(xi, xj) iff xj considers xi to be of diligence, P(xi, xj) iff xj considers xi to be of good profession, and T(xi, xj) iff xj considers xi to be of good temperament. Assuming the evaluation of each volunteer is of the same importance, for each attribute we get a binary relation, which embodies a kind of uncertainty caused by different interpretation of the data. Education: R1 ¼ fðx1 ; x3 Þ; ðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx2 ; x6 Þ; ðx2 ; x7 Þ; ðx4 ; x4 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þðx9 ; x9 Þ; ðx9 ; x10 Þ; ðx10 ; x10 Þg; Diligence: R2 ¼ fðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx2 ; x6 Þ; ðx3 ; x8 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx4 ; x7 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx5 ; x5 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þ; ðx10 ; x6 Þg; Profession: R3 ¼ fðx1 ; x3 Þ; ðx2 ; x1 Þ; ðx2 ; x2 Þ; ðx2 ; x5 Þ; ðx2 ; x7 Þ; ðx4 ; x4 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x2 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þg; Temperament: R4 ¼ fðx1 ; x2 Þ; ðx2 ; x1 Þ; ðx2 ; x2 Þ; ðx2 ; x5 Þ; ðx2 ; x8 Þ; ðx3 ; x1 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx4 ; x7 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x6 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x7 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þðx9 ; x9 Þ; ðx9 ; x10 Þg: Let R = {R1, R2, R3, R4}, then Int R ¼ fðx2 ; x1 Þ; ðx2 ; x5 Þ; ðx4 ; x5 Þ; ðx4 ; x6 Þ; ðx5 ; x2 Þ; ðx5 ; x4 Þ; ðx6 ; x10 Þ; ðx7 ; x2 Þ; ðx7 ; x4 Þ; ðx7 ; x8 Þ; ðx8 ; x3 Þ; ðx8 ; x8 Þ; ðx9 ; x7 Þ; ðx9 ; x9 Þ; ðx9 ; x10 Þg: ðInt RÞs ðx1 Þ ¼ ðInt RÞs ðx3 Þ ¼ ðInt RÞs ðx10 Þ ¼ ;; ðInt RÞs ðx2 Þ ¼ fx1 ; x5 g; ðInt RÞs ðx4 Þ ¼ fx5 ; x6 g; ðInt RÞs ðx5 Þ ¼ fx2 ; x4 g; ðInt RÞs ðx6 Þ ¼ fx10 g; ðInt RÞs ðx7 Þ ¼ fx2 ; x4 ; x8 g; ðInt RÞs ðx8 Þ ¼ fx3 ; x8 g; ðInt RÞs ðx9 Þ ¼ fx7 ; x9 ; x10 g: RðX 1 Þ ¼ fx1 ; x5 ; x6 g; RðX 2 Þ ¼ fx2 ; x4 g; R0 ðX 2 Þ ¼ fx5 g;

R0 ðX 3 Þ ¼ fx8 g;

RðX 3 Þ ¼ fx3 ; x8 g;

RðX 4 Þ ¼ fx10 g;

R0 ðX 1 Þ ¼ fx2 ; x4 g;

R0 ðX 4 Þ ¼ fx6 g:

Therefore, NulR(D) = {x1, x3, x10}, NegR(D) = {x7,x9}, POSR(D) = {x1, x2,x3,x4,x5,x6,x8,x10}, POS 0R ðDÞ ¼ fx2 ; x4 ; x5 ; x6 ; x8 g and cR ðDÞ ¼

jPOS 0R ðDÞj jU j

¼ 0:5.

2256

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The discernibility matrix of (U, R, D) is as follows: 2

ðxj Þ

R

6 R 6 6 6 R1 R2 R3 6 6 6 R 6 6 R 6 ðxi Þ6 6 R 6 6 6 R 6 6 R 6 6 4 R

R1 R2 R3

R2 R4

R

R

R

R

R

R

R

R1 R2

R

R

R

R

R2 R4

R1 R2 R3

R

R

R

R

R

R

R

R

R1 R3 R4

R

R

R

R

R2 R4

R

R

R1 R3

R

R

R

R

R

R

R1 R3 R4

R

R

R

R

R

R

R

R

R

R2

R

R

R

R

R

R

R

R

R

R

R

R

R

R1 R2 R4

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R1 R3 R4

R

R

R

R2 R3 R4

R

3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

and f ðU ; R; DÞðR1 ; R2 ; R3 ; R4 Þ ¼ ^f_ðcij Þg : i; j 6 10; cij 6¼ ;g ¼ ðR1 _ R2 _ R3 Þ ^ ðR2 _ R4 Þ ^ ðR1 _ R2 Þ ^ ðR1 _ R3 Þ ^ ðR1 _ R3 _ R4 Þ ^ R2 ^ ðR1 _ R2 _ R4 Þ ^ ðR2 _ R3 _ R4 Þ ¼ ðR1 _ R3 Þ ^ R2 ¼ ðR1 ^ R2 Þ _ ðR2 ^ R3 Þ So Red(R) = {{R1, R2},{R2, R3}}, Core(R) = {R2}. In view of assessing results of the selected volunteers, life level is only partially dependent upon the four factors, i.e., the dependency degree cR(D) of life level upon these factors is 0.5. Of the four factors, temperament is indispensable element and diligence is the most critical element to contribution to life level. 8. Experimental analysis Feature selection and attribute reduction are the basic application of rough sets. Pawlak’s rough sets can just be used to select nominal features because the model is built on equivalence relations and equivalence classes which can be directly generated from nominal features. Rough sets based on general binary relations can be used to compute more complex data, such as numerical data. In order to use rough sets based on general binary relations to reduce numerical data, one should first construct a technique to compute the relation between objects characterized with numerical features, and then compute the positive domain of the decision to evaluate the quality of selected features. In the experiments, we associate a box neighborhood with each sample, the samples in the box neighborhood of sample xi are called the successor neighborhood of xi. Formally, the successor neighborhood of xi is defined as RB(xi) = {xj"a 2 B, Da(xi, x) 6 e}, where Da is the distance function in feature space a, defined as Da(xi,x) = jxi  xj, and e P 0 is a user-specified threshold. Obviously [ni¼1 RB ðxi Þ ¼ U and RB(xi) 6¼ ;. Then the significance of B with respect to D is defined as cB ðDÞ ¼

jPOS 0B ðDÞj : jU j

Accordingly, the conditional significance of a in B can be defined as SIGða; B; DÞ ¼

jPOS 0B ðDÞj jPOS 0Ba ðDÞj  : jU j jU j

Based on the measures of significance, we can compute the significance and conditional significance of single attribute and select the attributes with maximal significance and conditional significance one by one. There are 13 numerical features in wine data downloaded from UCI Machine Learning Repository [23]. We compute the significance cai ðDÞ of these 13 attributes, as shown in Fig. 1a. We can find that feature 13 get the

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

a

2257

b

0.2

0.4

0.15 0.1

0.2 0.05 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13

c

1 2 3 4 5 6 7 8 9 10 11 12

d

0.4

0.1

0.05

0.2

0

0 1 2 3 4 5 6 7 8 9 10 11

e

1 2 3 4 5 6 7 8 9 10

f

0.03

1

0.02 0.5 0.01

0

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

Fig. 1. Significance and conditional significance of attributes.

maximal significance among the set of features; so we select feature 13 into the reduct. In the second round, we calculate the conditional significance ca13 [ai ðDÞ  ca13 ðDÞ of the rest 12 features, and feature 10 yields the maximal value of significance. Finally, features 13, 10, 7, 11 and 1 are selected in the reduct. The significance cfa13 ;a10 ;at ;a11 ;a1 g ðDÞ of these five features is 1. This means that the decision table is consistent in this granularity, here e = 0.1. Fig. 1f shows the dependency cBi ðDÞ of the decision attribute on the selected features. We here use a heuristic search strategy to search the reduct. We start with an empty set of attribute, and select the feature which maximizes the significance increment in each round until the attribute significance does not increase or increases less than a specified threshold. As we select the feature with the maximal significance increment, the features with high relevance to a selected feature can not be included because no much new information is introduced by the relevant features [50]. It is notable that the search strategy is only one of the candidate solutions; there are also other strategies to search reducts [10,17,25,26]. As a whole, we gather six data sets from UCI Machine Learning Repository. The data sets are described in Table 2, where N, F and C stands for the numbers of samples, features and classes, respectively. Table 2 Data description Data

S

F

C

Credit Hepatitis Heart Iono Wdbc Wine

690 155 270 351 569 178

15 19 13 34 30 13

2 2 2 2 2 3

2258

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

The features selected with rough sets based on general binary relations (RS-GBR), Pawlak rough sets, consistency [10] and CFS [17] are presented in Tables 3–6 respectively, where Pawlak rough sets and consistency can not deal with numerical features, we introduce MDL entropy discretization algorithm to discretize these features before reduction. Tables 7 and 8 show the average classification performance of raw data and reduced data based on 10-fold cross validation, respectively, where linear and RBF-SVM are introduced to validate the selected features. These algorithms are implemented with OSU_SVM3.00 downloaded from http://www.ece. osu.edu/maj/osu_svm/osu_svm3.00.zip. It is easy to find that the performances of RS-GBR based reducts in most cases are comparable or improved and the method based on RS-GBR outperforms discretization+ Pawlak rough sets.

Table 3 Features selected with RS-GBR algorithms Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

11, 2, 6, 14, 3, 9 2, 17, 1, 18, 15, 11, 9 10, 12, 3, 13, 1, 4, 5, 7, 8 1, 5, 13, 34, 24, 25, 3, 26 23, 22, 28, 12, 25, 9, 1, 27, 2, 26, 18, 11, 15, 29, 19 13, 10, 7, 6, 1

6 7 9 8 15 5

Table 4 Features selected with Pawlak’s rough set algorithms Data

Features

Number

Credit Hepatitis Hart Iono Wdbc Wine

4, 7, 9, 15, 1, 3, 11, 6, 14, 8, 2 2, 18, 8, 10, 4, 5, 17, 19, 13, 15, 3, 12 – 5, 3, 6, 34, 17, 14, 22, 4 24, 8, 22, 26, 13, 5, 14 10, 13, 7, 2

11 12 8 7 4

Table 5 Features selected with consistency Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

9, 1, 1, 5, 7, 1,

10 4 13 7 7 5

4, 10, 15, 14, 2, 6, 8, 3, 1, 11 6, 17, 18 2, 3, 7, 8, 9, 10, 11, 12, 13 6, 8, 13, 22, 27, 34 13, 21, 22, 27, 28, 29 3, 4, 7, 10

Table 6 Features selected with CFS Data

Features

Number

Credit Hepatitis Heart Iono Wdbc Wine

5, 1, 3, 1, 2, 1,

7 7 7 14 11 11

6, 2, 7, 3, 7, 2,

8, 6, 8, 4, 8, 3,

9, 11, 14, 15 11, 14, 17, 18 9, 10, 12, 13 5, 6, 7, 8, 14, 18, 21, 27, 28, 29, 34 14, 19, 21, 23, 24, 25, 27, 28 4, 5, 6, 7, 10, 11, 12, 13

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2259

Table 7 Comparison of classification performances with linear SVM

Credit Heart Hepatitis Iono Wdbc Wine

Raw data

RS-GBR

Pawlak’s rough sets

CFS

Consistency

81.44 ± 7.18 83.33 ± 5.31 86.17 ± 7.70 87.57 ± 6.45 97.73 ± 2.48 98.89 ± 2.34

85.48 ± 18.51 83.33 ± 6.59 89.00 ± 6.30 87.25 ± 5.47 97.02 ± 2.03 98.33 ± 2.68

82.88 ± 14.34 – 85.00 ± 7.24 83.30 ± 5.97 95.09 ± 2.83 95.00 ± 4.10

80.12 ± 14.0 84.81 ± 5.91 90.17 ± 6.59 86.38 ± 5.35 96.32 ± 1.92 98.89 ± 2.34

81.86 ± 14.37 84.44 ± 4.88 88.00 ± 5.26 83.21 ± 6.37 95.61 ± 2.51 95.97 ± 4.83

Table 8 Comparison of classification performances with RBF SVM

Credit Heart Hepatitis Iono Wdbc Wine

Raw data

RS-GBR

Pawlak’s rough sets

CFS

Consistency

81.44 ± 7.18 81.11 ± 7.50 83.50 ± 5.35 93.79 ± 5.07 98.08 ± 2.25 98.89 ± 2.34

85.63 ± 18.48 80.74 ± 4.88 86.17 ± 6.85 91.48 ± 5.26 97.20 ± 2.05 97.15 ± 3.01

81.00 ± 16.25 – 84.17 ± 8.21 91.54 ± 5.53 95.61 ± 2.37 97.22 ± 2.93

85.05 ± 17.79 80.74 ± 6.72 89.67 ± 5.54 95.19 ± 4.43 96.84 ± 1.80 98.89 ± 2.34

82.78 ± 15.56 80.37 ± 6.77 90.33 ± 5.54 92.62 ± 3.74 96.49 ± 2.61 97.15 ± 3.99

Moreover, Pawlak rough set based reduction find nothing for data heart. If each feature produces zero significance in greedy forward search reduction, no feature can be obtained because the search procedure stops here. Variable precision rough set model is a candidate solution to this problem. We can find that as to data credit the features selected by RS-GBR is a subset of the features selected by consistency. Although features 1, 4, 8, 10, 15 are deleted from the subset of features selected by consistency, the classification performances are further improved. This means there is redundant information in the set of features selected by consistency. 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

2

4

6

8

10

0

0

2

4

Heart 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

2

4

Iono

6

8

Hepatitis

6

8

0

0

5

10

Wdbc

Fig. 2. Variation of significance as features are selected one by one.

15

2260

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

Fig. 2 shows the variation of significance of selected features. There are some similar characteristics with these four curves. The significance goes up quickly at the first stages, and then slows down. Furthermore, the maximal value of significance is 1 as to these four data sets, this shows the data sets used here are consistent in the granularity level of 0.1. 9. Conclusions Attribute reduction in Pawlak’s rough set theory is based on equivalence relations, but this condition does not always hold in many practical problems and this limits the wide application of the theory. Therefore, we relax the equivalence relations to general binary relations, and propose methods to reduce useless attributes of relation information systems, consistent relation decision systems and relation decision systems, and develop the theorems necessary for computation of all the reducts. Experimental results show that the proposed methods have great power in attribute reduction and can be used to deal with more complex data sets. Our reduction methods can be used to select useful features and eliminate redundant and irrelevant information. With the above discussion, the theory of attribute reduction with rough sets based on general binary relations has been established. It should be pointed out that the proposed reduction methods are natural generalizations of the reduction methods in Pawlak’s rough set theory [41]. Acknowledgements The authors are highly grateful to the referees and the Editor-in-Chief, Professor Witold Pedrycz, for their valuable comments and suggestions for improving the paper. The research is supported by National Natural Science Foundation of China (Nos. 10571025 and 60703013). References [1] J. Bazan, A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery, Physica-Verlag, Heidelberg, 1998, pp. 321–365. [2] M. Beynon, Reducts within the variable precision rough sets model: a further investigation, European Journal of Operational Research 134 (2001) 592–605. [3] Z. Bonikowski, Algebraic structures of rough sets, in: W. Ziarko (Ed.), Rough Sets, Fuzzy Sets and Knowledge Discovery, SpringerVerlag, London, 1994, pp. 243–247. [4] Z. Bonikowski, E. Bryniarski, U. Wybraniec, Extensions and intentions in the rough set theory, Information Sciences 107 (1998) 149– 167. [5] E. Bryniaski, A calculus of rough sets of the first order, Bulletin of the Polish Academy of Sciences 16 (1989) 71–77. [6] G. Cattaneo, Abstract approximate spaces for rough theories, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications, Physica-Verlag, Heidelberg, 1998, pp. 59–98. [7] D.-G. Chen, The part reductions in information systems, in: S. Tsumoto, R. Slowinski, H.J. Komorowski, J.W. Srzymala-Busse (Eds.), Rough Sets and Current Trends in Computing, LNAI3066, Springer-Verlag, Berlin, 2004, pp. 477–482. [8] D.-G. Chen, C.Z. Wang, Q.H. Hu, A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets, Information Sciences 177 (17) (2007) 3500–3518. [9] D.-G. Chen, W.-X. Zhang, D. Yeung, E.C.C. Tsang, Rough approximation on a complete completely distributive lattice with applications to generalized rough sets, Information Sciences 176 (2006) 1829–1848. [10] M. Dash, H. Liu, Consistency-based search in feature selection, Artificial Intelligence 151 (2003) 155–176. [11] S. Greco, B. Matarazzo, R. Slowinski, Rough set theory for multicriteria decision analysis, European Journal of Operational Research 129 (2001) 1–47. [12] S. Greco, B. Matarazzo, R. Slowinski, Rough sets methodology for sorting problems in presence of multiple attributes and criteria, European Journal of Operational Research 38 (2002) 247–259. [13] Q.H. Hu, D.R. Yu, Entropies of fuzzy indiscernibility relation and its operations, International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 12 (5) (2004) 575–589. [14] Q.H. Hu, D.R. Yu, Z.X. Xie, J.F. Liu, Fuzzy probabilistic approximation spaces and their information measures, IEEE Transactions on Fuzzy Systems 14 (2) (2006) 191–201. [15] Q. Hu et al., Neighborhood classifiers, Expert Systems with Applications (2006), doi:10.1016/j.eswa.2006.10.043. [16] Q.H. Hu, D.R. Yu, Z.X. Xie, Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters 27 (5) (2006) 414–423.

C.Z. Wang et al. / Information Sciences 178 (2008) 2237–2261

2261

[17] M. Hall, Correlation-based feature selection for discrete and numeric class machine learning, in: Proceedings of the 17th ICML, CA, 2000, pp. 359–366. [18] M. Kryszkiewicz, Rough set approach to incomplete information systems, Information Sciences 112 (1998) 39–49. [19] M. Kryszkiewicz, Rules in incomplete information systems, Information Sciences 113 (1999) 271–292. [20] M. Kryszkiewski, Comparative study of alternative type of knowledge reduction in inconsistent systems, International Journal of Intelligent Systems 16 (2001) 105–120. [21] J.-S. Mi, W.-Z. Wu, W.-X. Zhang, Approaches to knowledge reductions based on variable precision rough sets model, Information Sciences 159 (3–4) (2004) 255–272. [22] J.N. Mordeson, Rough set theory applied to (fuzzy) ideal theory, Fuzzy Sets and Systems 121 (2001) 315–332. [23] D.J. Newman, S. Hettich, C.L. Blake, C.J. Merz, UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/ MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA, 1998. [24] S.H. Nguyen, PhD Thesis, Data regularity analysis and applications in data mining, in: L. Polkowski, T.Y. Lin, S. Tsumoto (Eds.), Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, volume 56 of Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, 2000, pp. 289–378. [25] H.S. Nguyen, D. Slezak, Approximation reducts and association rules correspondence and complexity results, in: N. Zhong, A. Skowron, S. Oshuga (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, LNAI 1711, Springer, Berlin, 1999, pp. 137–145. [26] Hung Son Nguyen, Approximate Boolean reasoning: foundation and application in data mining, in: J.F. Peters, A. Skowron (Eds.), Transactions on Rough Sets V, LNCS, vol. 4100, Springer-Verlag, Heidelberg, 2006, pp. 334–506. [27] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [28] Z. Pawlak, Rough set theory and its applications in data analysis, International Journal of Cybernetics and Systems 29 (1998) 661– 688. [29] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Boston, 1991. [30] Z. Pawlak, Andrzej Skowron, Rudiments of rough sets, Information Sciences 177 (2006) 3–27. [31] Z. Pawlak, Andrzej Skowron, Rough sets: some extensions, Information Sciences 177 (2006) 28–40. [32] Z. Pawlak, Andrzej Skowron, Rough sets and Boolean reasoning, Information Sciences 177 (2006) 41–73. [33] L. Polkowski, A. Skworon, J. Zytkow, Rough foundations for rough sets, in: Proceedings of the Third International Workshop on Rough Sets and Soft Computing (RSSC’94), San Jose State University, CA,1994, pp. 142–149. [34] L. Polkowski, A. Skworon, J. Zytkow, Rough foundations for rough sets, in: T.Y. Lin, A.M. Wildberger (Eds.), Soft Computing, Simulation Councils, Inc., San Diego, 1995, pp. 55–58. [35] J.A. Pomykala, Approximation operations in approximation space, Bulletin of the Polish Academy of Sciences 35 (1987) 653–662. [36] M. Quafatou, a-RST: a generalization of rough set theory, Information Sciences 124 (2000) 301–316. [37] D. Slezak, Searching for dynamic reducts in inconsistent decision tables, in: Proceedings of IPMU’ 98, France, vol. 2, 1998, pp. 1362– 1369. [38] D. Slezak, Approximate reducts in decision tables, in: Proceedings of IPMU’96, Granada, Spain, vol. 3, 1996, pp. 1159–1164. [39] J. Stefanowski, On rough set based approaches to induction of decision rules, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery, vol. 1, Physica-Verlag, Heidelberg, 1998, pp. 500–529. [40] Q. Shen, R. Jensen, Selecting informative features with fuzzy rough sets and its application for complex systems monitoring, Pattern Recognition 37 (2004) 1351–1363. [41] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, in: R. Slowinski (Ed.), Intelligent Decision Support – Handbook of Applications and Advances of Rough Sets Theory, Kluwer Academic Publishers, Boston, 1992, pp. 331–362. [42] A. Skowron, I. Stepaniuk, Tolerance approximation spaces, Fundamenta Informaticae 27 (1996) 245–253. [43] R. Slowinski, D. Vanderpooten, A generalized definition of rough approximations based on similarity, IEEE Transactions on Knowledge and Data Engineering 12 (2) (2000) 331–336. [44] W.-Z. Wu, M. Zhang, H.-Z. Li, J.-S. Mi, Knowledge reduction in random information systems via Dempster–Shafer theory of evidence, Information Sciences 174 (2005) 143–164. [45] U. Wybraniec-Skardowska, On a generalization of approximation space, Bulletin of the Polish Academy of Sciences, Mathematics 37 (1989) 51–61. [46] Y.Y. Yao, Constructive and algebraic method of theory of rough sets, Information Sciences 109 (1998) 21–47. [47] Y.Y. Yao, Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111 (1998) 239–259. [48] D.S. Yeung, D.G. Chen, E.C.C. Tsang, J.W.T. Lee, X.Z. Wang, On the generalization of fuzzy rough sets, IEEE Transactions on Fuzzy Systems 13 (3) (2005) 343–361. [49] D. Yu, Q.H. Hu, W. Bao, Combining rough set methodology and fuzzy clustering for knowledge discovery for quantitative data, Proceedings of Chinese Society of Electrical Engineering 24 (6) (2004) 205–210. [50] L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research 5 (2004) 1205–1224. [51] W. Zakowski, Approximations in the space (U, P), Demonstratio Mathematica 16 (1983) 761–769. [52] W.-X. Zhang, W.-Z. Wu, J.-Y. Liang, D.-Y. Li, Theory and Methods of Rough Sets, Science Press, Beijing, 2001. [53] William Zhu, Topological approaches to covering rough sets, Information Sciences 177 (2007) 1892–1915. [54] W. Zhu, F.-Y. Wang, Reduction and axiomization of covering generalized rough sets, Information Sciences 152 (2003) 217–230. [55] W. Ziarko, Variable precision rough set model, Journal of Computer and System Sciences 46 (1993) 39–59.