Ann Oper Res (2011) 185: 105–138 DOI 10.1007/s10479-009-0542-3
Global investing risk: a case study of knowledge assessment via rough sets Salvatore Greco · Benedetto Matarazzo · Roman Slowinski · Stelios Zanakis
Published online: 6 May 2009 © Springer Science+Business Media, LLC 2009
Abstract This paper presents an application of knowledge discovery via rough sets to a real life case study of global investing risk in 52 countries using 27 indicator variables. The aim is explanation of the classification of the countries according to financial risks assessed by Wall Street Journal international experts and knowledge discovery from data via decision rule mining, rather than prediction; i.e. to capture the explicit or implicit knowledge or policy of international financial experts, rather than to predict the actual classifications. Suggestions are made about the most significant attributes for each risk class and country, as well as the minimal set of decision rules needed. Our results compared favorably with those from discriminant analysis and several variations of preference disaggregation MCDA procedures. The same approach could be adapted to other problems with missing data in data mining, knowledge extraction, and different multi-criteria decision problems, like sorting, choice and ranking. Keywords Knowledge discovery · Investing risk assessment · Rough sets · Decision rule mining · Multi-criteria classification · Artificial intelligence · Financial engineering · Missing values
S. Greco · B. Matarazzo University of Catania, Catania, Italy S. Greco e-mail:
[email protected] B. Matarazzo e-mail:
[email protected] R. Slowinski Institute of Computing Science, Poznan University of Technology, Poznan, Poland e-mail:
[email protected] S. Zanakis () Florida International University, Miami, FL 33199, USA e-mail:
[email protected] 106
Ann Oper Res (2011) 185: 105–138
1 Introduction Many real decision problems include qualitative and quantitative data in the form of an information table (decision matrix) concerning specific objects (actions, states, competitors, etc.) described by a finite set of attributes (features, characteristics, criteria, variables, etc.). When the dependent variable of interest is categorical, then the objective of explanatory and/or predictive modeling is either matching prior classification of objects described by multiple attributes or, if class membership (holistic information) of each object is not known, group them in clusters by similarity. The former situation (classification problem) is of interest in this paper, as such a model represents knowledge about experts’ classifications and can be used for explanation and/or for prescription. The most frequently used models have the form of a function (discriminant function, logistic regression equation) or a binary relation (similarity or outranking relation). Their explanatory capacity is, however, weak in comparison with Artificial Intelligence employing a set of logical statements “if. . ., then. . .”, known as decision rules. The decision rule model inferred from the information table is easy to interpret, it is expressed in terms of relevant attributes and explains all local trade-offs that can be lost in the functional or relational model. Moreover, this is supported by the acknowledgement of psychologists that people make decisions by searching for rules, which provide good justification of their decisions. Construction of the classification model from the holistic information faces, however, a difficult problem of possible inconsistencies. The inconsistency occurs when two objects having the same or similar description (indiscernible) in terms of attributes have been assigned to different classes. Some methods, e.g. statistical ones and those based on the use of artificial neural nets, have a tendency to consider these inconsistencies as an error or noise. We advocate another interpretation of the inconsistencies, as caused by hesitation, instability of the value system of the expert and lack of some discriminating attributes. Instead of deleting the inconsistent examples or amalgamating them in a comprehensive model of classification, we prefer to use an approach that will distinguish between certain and possible part of input information and will yield a model composed of the corresponding parts: exact and approximate. Rough set theory provides an appropriate tool to meet the above goals. In essence, it identifies dependencies in attributes, reduces the dataset by removing the weakest attributes from the information table, leading to sequential building of rules of the form “if an object meets some conditions on a subset of attributes, then assign the object to a particular class”. It also has an additional advantage over statistical methods (e.g. discriminant analysis and multinomial logistic regression), namely that in the revised form employed in this paper, it can deal with missing values—often encountered in real life incomplete data sets—in methodologically appropriate ways. Rough sets have been applied to a variety of real applications, including: financial engineering, such as credit cards assessment, country risk evaluation, credit risk assessment, corporate acquisitions, business failure prediction (Slowinski and Zopounidis 1995; Greco et al. 1998, 2002; Zopounidis and Doumpos 2002); in customer relationship management (Tseng and Huang 2007; Greco et al. 2007); in medicine for diagnosis from clinical databases (Tsumoto 1998) and emergency room initial assessment (Michalowski et al. 2003; Wilk et al. 2005); in environment for water pollution evaluation (Rossi et al. 1999) and forest ecosystem taxonomy (Flinkman et al. 2000); and in water management (Barbagallo et al. 2006)—among others. The use of rough sets in multi-criteria decision support has been explored extensively by Greco et al. (1999a, 1999b, 2000a, 2001, 2005), Slowinski et al. (2005). Relationships between the rough set approach and other multiple criteria aggregation procedures have been investigated from an axiomatic point of view in Greco et al.
Ann Oper Res (2011) 185: 105–138
107
(2004), Slowinski et al. (2002). A ScholarGoogle search on rough set theory and applications produced over 5,000 journal articles and books (367 related to investment). In this paper, we apply our extended rough set approach to a real data set from The Wall Street Journal, concerning classification of 52 countries by expert international investors into five market risk classes, according to 27 attributes. These data are not complete, having 9.5% of the observations missing. Hence, the application of the rough set approach, extended to the case of missing values, is particularly appropriate. The rough set approach applied to the above incomplete data set performs the two crucial tasks: it models the rating policy of international experts and explains their implicit or explicit policy in comprehensible decision rules. The strongest/minimal sets of rules for classifying correctly each country are identified, as well as the most significant attributes for each class of risk. Our approach could be adapted to other problems with missing data, such as data mining, knowledge extraction, and different multi-criteria decision problems, like sorting, choice and ranking. The key idea of rough sets is approximation of knowledge expressed by decision attributes using knowledge expressed by condition attributes. The rough set approach permits to analyze real data sets since it can: a) use quantitative as well as qualitative data; b) use incomplete data; c) handle inconsistencies in human judgments, by distinguishing between certain and possible parts of information input, thus obtaining a model composed of certain and approximate decision rules; d) identify/eliminate redundant attributes (variables) keeping the most significant ones for each risk class; e) show the decision rules matching the maximum number of countries in each class of risk; and most importantly, f) summarize the implicit or explicit rating policy of international investment expert managers commissioned by WSJ using a minimal number of comprehensible and easy to interpret decision rules. This paper is organized in the following way. In Sect. 2, we provide a non-technical synopsis of rough sets and its extension for handling the missing values. Section 3 describes the real case dataset and the discretization of the continuous attributes. Section 4 is devoted to the application of the extended rough set approach to this real case of financial risk assessment for 52 countries commissioned by The Wall Street Journal. Section 5 contains our conclusions. In Appendix A, we present in a greater detail the rough set methodology and its extension to the case of missing values, using some didactic examples.
2 Rough set methodology overview In this section, the basic concepts of rough set theory are introduced in non-technical terms, while Appendix A provides methodological details for datasets without and with missing values, illustrated by few examples to facilitate their understanding. The concept of rough set introduced by Pawlak (1982, 1991), Pawlak et al. (1995) is very useful in dealing with granularity of information in data analysis. The rough set philosophy is based on the assumption that with every object of the universe there is associated a certain amount of information (data, knowledge), expressed by means of some attributes used for object description. In our case study, the objects are countries evaluated from the viewpoint of financial risk for investors, the information on countries’ financial markets may be given in terms of economic indicators, depth and liquidity, performance and value, regulation and efficiency, and so on. The rows of the information table are labeled by objects, whereas columns are labeled by attributes and entries of the table are attribute-values, called descriptors. Formally, by an information table we understand the 4-tuple S = U, Q, V , f, where U is a finite set of objects, called universe, Q is a finite set of attributes, V = q∈Q Vq and Vq is the set of values assumed by the attribute q (called also domain of attribute q), and f : U × Q → V
108
Ann Oper Res (2011) 185: 105–138
is a function such that f (x, q) ∈ Vq ∪ {∗} for every q ∈ Q and x ∈ U , called an information function (Pawlak 1982, 1991; Pawlak et al. 1995). The symbol “∗” indicates that the value of some attribute for a given object is unknown (missing). The set of all attributes Q is, in general, divided into set C of condition attributes and set D of decision attributes. Objects (countries) having the same description are indiscernible with respect to the available information. The indiscernibility relation thus generated constitutes a mathematical basis of rough set theory. The use of the indiscernibility relation results in information granulation, because it induces a partition of the universe into blocks of indiscernible objects, called elementary sets, which can be used as “bricks” to build knowledge about a real or abstract world, like risk for investors in the above example. Any subset X of the universe (e.g. countries with the same level of risk) may be characterized in terms of these bricks (as a union of elementary sets) either precisely or approximately only. In the latter case, the subset X may be characterized by two ordinary sets, called lower and upper approximations. A rough set is defined by means of these two approximations, which coincide in the case of an ordinary set. The lower approximation of X is composed of all the elementary sets included in X (whose elements, therefore, certainly belong to X), while the upper approximation of X consists of all the elementary sets which have a nonempty intersection with X (whose elements, therefore, may belong to X). Obviously, the difference between the upper and lower approximations constitutes the boundary region of the rough set, whose elements cannot be characterized with certainty as belonging or not to X, using the available information. The information about objects from the boundary region is, therefore, inconsistent or ambiguous. Clearly, in ordinary sets the boundary region is empty. The cardinality of the boundary region states, moreover, to what extent it is possible to express X in exact terms, on the basis of the available information. Some important characteristics of the rough set approach make it a particularly interesting tool in a number of problems and concrete applications, as demonstrated by Pawlak (1982, 1991), Pawlak et al. (1995). With respect to the input information, it is possible to deal with both quantitative and qualitative data, and inconsistencies need not to be removed prior to the analysis. Then, the output model is a set of certain and approximate decision rules. The syntax of certain rules is: “if object x has value vi1 on attribute qi1 and value vi2 on attribute qi2 and . . . value vip on attribute qip , then x belongs to class Clt ”, while the syntax of approximate rules is: “if object x has value vi1 on attribute qi1 and value vi2 on attribute qi2 and . . . value vip on attribute qip , then x belongs to class Cls or Clt ”. For example, “if Size is compact, then the car is poor” is a certain rule, while “if Max-Speed is high, then the car is good or excellent” is an approximate rule. Within the rough set approach, knowledge is understood as a partition of the set of objects into blocks of objects indiscernible with respect to considered attributes. The key idea of rough sets is approximation of knowledge expressed by holistic classification (decision attributes), using knowledge expressed by condition attributes. With reference to the output information, it is also possible to acquire a posteriori information regarding the relevance of particular attributes or subsets of attributes to the quality of approximation considered in the problem at hand. Moreover, the final result using the most relevant attributes, in the form of “if. . ., then. . .” decision rules, is easy to interpret. The rough set approach answers several questions related to the approximation: (a) is the information contained in the table consistent? (b) which are the non-redundant subsets of condition attributes ensuring the same quality of approximation as the whole set of condition attributes? (c) which are the condition attributes that cannot be eliminated without decreasing the quality of approximation? (d) which minimal “if . . ., then . . .” decision rules can be induced from the approximations? In many practical applications, however, the information about objects is often incomplete because some data are missing. For example, in the case of country risk evaluation,
Ann Oper Res (2011) 185: 105–138
109
the rating of Standard & Poor’s or Moody’s may not by available for some countries. To deal with missing values, a modification of the rough set approach for the analysis of an incomplete information table was proposed by Greco et al. (2000b). This extended approach is based on a specific definition of the indiscernibility relation taking into account missing values. The indiscernibility relation between a pair of objects is considered as a directional statement, where a subject is compared to a referent object. It requires that the referent object has no missing values. When there are no missing values, the indiscernibility relation is reflexive, symmetric and transitive. When some values are missing in the object description, this relation is transitive but neither reflexive nor symmetric. The rules induced from the rough approximations defined according to the new indiscernibility relation verify some suitable properties: they are either certain or approximate, depending whether they are consistent with respect to indiscernibility or not; and they are robust in a sense that each rule is supported by at least one object with no missing value on the condition attributes represented in the rule. The revised approach also restores key concepts of rough set theory (accuracy and quality of approximation, reduct and core) for data with missing values. The accuracy is defined as the ratio of cardinality of the lower to the cardinality of the upper approximation. The quality is defined as the ratio of cardinality of the lower approximation to the cardinality of the universe U . For a subset P ⊆ C of condition attributes, it expresses the ratio of all P -correctly classified objects (i.e. classified correctly when taking into account information provided by attributes from P ) to the total number of objects in the information table. A minimal subset of condition attributes giving the same quality of approximation as the entire set of condition attributes is called a reduct. An information table can have more than one reduct. The intersection of all reducts is called the core. The extended rough set approach preserves all good characteristics of the classical approach and it boils down to the classical approach when there are no missing values. Decision rules are logical statements (consequence relations) of the type “if . . ., then. . .”, where the antecedent (condition part) is a conjunction of elementary conditions concerning particular condition attributes, and the consequent (decision part) is a disjunction of possible assignments to particular classes of a partition of universe U induced by decision attributes. If the consequence is univocal the rule is certain, otherwise it is approximate or uncertain. The objects belonging to the lower approximation can be considered as a basis for induction of certain decision rules. The objects belonging to the boundary can be considered as a basis for induction of approximate decision rules. A minimal decision rule implies that there is no other rule with antecedent of at least the same weakness and a consequent of at least the same strength; moreover, a set of decision rules is minimal when it is complete and nonredundant, i.e. exclusion of any rule from this set makes it non-complete. The number of objects supporting a decision rule is called its strength. Therefore, the larger the number of objects supporting a decision rule, the stronger the rule becomes. Typically, one of three induction strategies can be adopted to obtain a set of decision rules (Stefanowski and Vanderpooten 2001): • generation of a minimal description, i.e. a minimal set of rules, • generation of an exhaustive description, i.e. all possible minimal rules for a given information table, • generation of a characteristic description, i.e. a set of minimal rules covering relatively many objects each, but not necessarily all objects from U .
110
Ann Oper Res (2011) 185: 105–138
3 Rating the financial risk of countries—a case study We applied the rough set methodology to a global investing report published in The Wall Street Journal (June 26, 1997), listing the rating of investing risk in 52 countries of the world. Experienced international investors placed all countries in five categories (from safest 1 to riskiest 5). The countries are further described using 27 attributes (explained later) representing main characteristics of the financial markets of these countries. Missing value analysis revealed that 9.5% of the data are missing completely at random, with only four countries showing significant missing pattern for one attribute (3-yr compound annual total return in local currency); this is immaterial since that attribute did not appear in any of the decision rules presented in this section. Hence, the application of rough set approach, extended to the case of missing values, is particularly appropriate. The aims of the rough set analysis are the following: • check if the information contained in the information table is consistent, • calculate the reducts of condition attributes ensuring the same quality of approximation of the rating as the whole set of condition attributes, i.e. check if some attributes are superfluous and could be eliminated from the analysis without loss of information, • in case of multiple reducts, calculate the core indicating the indispensable attributes, • represent the rating policy of international investors in terms of minimal decision rules induced from rough approximations of decision classes corresponding to particular levels of risk, • induce the set of all possible decision rules from the information table reduced to the best reduct, i.e. represent all possible relationships between the country’s description in terms of the relevant attributes and its level of risk, • induce a minimal set of minimal decision rules that cover all the countries from the information table, i.e. summarize the rating policy of international investors using a minimal number of decision rules, • discover the strongest relationships between the country’s description and each level of risk, i.e. show the decision rules matching the description of the maximum number of countries in each class of risk, • identify the most significant attributes characterizing each class of risk. Let us present the case and the results of the rough set approach handling the missing values. The investors have rated the 52 countries into 5 classes of risk, as follows: Level 1—most similar to U.S.: U.S., Australia, Canada, Denmark, France, Germany, Ireland, Netherlands, New Zealand, Sweden, Switzerland, U.K.; Level 2—other developed: Austria, Belgium, Finland, Hong Kong, Italy, Japan, Norway, Singapore, Spain; Level 3—mature emerging markets: Argentina, Brazil, Chile, Greece, Korea, Malaysia, Mexico, Philippines, Portugal, South Africa, Thailand; Level 4—newly emerging markets: China, Colombia, Czech Republic, Hungary, India, Indonesia, Israel, Poland, Sri Lanka, Taiwan, Venezuela; Level 5—the frontier: Egypt, Jordan, Morocco, Nigeria, Pakistan, Peru, Russia, Turkey, Zimbabwe.
Ann Oper Res (2011) 185: 105–138
111
Within the rough set approach, the 52 countries are considered as objects and the 5 levels of risk correspond to the decision attribute making partition of the set of objects into 5 decision classes. The countries were described using the following 27 indices (Wall Street Journal, June 26, 1997)—considered as condition attributes within the rough set approach: A1) A2) A3) A4) A5) A6) A7) A8) A9) A10) A11) A12) A13) A14) A15) A16) A17) A18) A19) A20) A21) A22) A23) A24) A25) A26) A27)
3-yr compound annual total return [%] (with respect to U.S. dollar), 3-yr compound annual total return [%] (with respect to local currencies), price/earning ratio, forward price/earnings ratio, historic earnings growth [%], projected earnings [%], dividend yield [%], GNP per capita [U.S. $], real GDP growth rate [%], projected GDP growth [%], projected inflation rate [%], short term interest rate [%], market capitalization [millions of U.S. $], turnover [%], total listed companies, total public ADRs or ordinary shares available in U.S., country funds available in U.S. [yes/no], settlement efficiency, safekeeping efficiency, operational costs, withholding tax on dividends for U.S.-based investors [%], settlement lag [days], year in which the stock exchange was established, Standard and Poor’s long-term foreign-currency credit rating, Moody’s long-term foreign-currency credit rating, volatility, correlation with U.S.
For continuous-valued attributes we performed a pre-processing for discretization. The discretization method used was based on the entropy measure (Fayyad and Irani 1992). The resulting divisions of attribute domains into sub-intervals is given below: Attribute A1: [−21.67; −10.29[, [−10.29; 6.74[, [6.74; 12.285[, [12.285; 27.37[, [27.37; 56.05]; Attribute A2: [−13.58; 9.02[, [9.02; 13.37[, [13.37; 20.22[, [20.22; 21.445[, [21.445; 28.94]; Attribute A3: [8.6; 13.45[, [13.45; 16.75[, [16.75; 18.25[, [18.25; 22.2[, [22.2; 54.4]; Attribute A4: [5.3; 7.75[, [7.75; 10.8[, [10.8; 13.95[, [13.95; 16.65[, [16.65; 30.6]; Attribute A5: [−5.6; −2.5[, [−2.5; 14.4[, [14.4; 22.25[, [22.25; 44.4[, [44.4; 110.7]; Attribute A6: [4.2; 7.8[, [7.8; 11.4[, [11.4; 12.5[, [12.5; 16.1[, [16.1; 24.3]; Attribute A7: [0.9; 2.2[, [2.2; 2.85[, [2.85; 3.25[, [3.25; 3.75[, [3.75; 6.8]; Attribute A8: [260; 3090[, [3090; 12865[, [12865; 13960[, [13960; 18870[, [18870; 40630]; Attribute A9: [−9.8; −0.75[, [−0.75; 2.35[, [2.35; 2.95[, [2.95; 5.65[, [5.65; 12.8]; Attribute A10: [0.7; 2.9[, [2.9; 3.8[, [3.8; 5.3[, [5.3; 6.55[, [6.55; 9.5];
112
Ann Oper Res (2011) 185: 105–138
Attribute A11: [1; 3.15[, [3.15; 7[, [7; 7.2[, [7.2; 8.95[, [8.95; 75]; Attribute A12: [0.62; 5.225[, [5.225; 7.525[, [7.525; 19.575[, [19.575; 47.375[, [47.375; 71.6]; Attribute A13: [1848; 21127.5[, [21127.5; 27340[, [27340; 37759[, [37759; 309583.5[, [309583.5; 8484433]; Attribute A14: [2.6; 22.7[, [22.7; 32.4[, [32.4; 47[, [47; 134.1[, [134.1; 328.6]; Attribute A15: [45; 102[, [102; 143[, [143; 668[, [668; 959[, [959; 8800]; Attribute A16: [0; 7[, [7; 9[, [9; 107[, [107; 154[, [154; 261]; Attribute A18: [−2; 68[, [68; 78[, [78; 89[, [89; 97[, [97; 99]; Attribute A19: [44; 79[, [79; 87[, [87; 89[, [89; 97[, [97; 98]; Attribute A20: [19; 43.5[, [43.5; 52.5[, [52.5; 56.5[, [56.5; 83[, [83; 176]; Attribute A21: [0; 11[, [11; 12.75[, [12.75; 15.75[, [15.75; 18.75[, [18.75; 35]; Attribute A22: [1; 2[, [2; 6[, [6; 11[, [11; 23[, [23; 30]; Attribute A23: [1611; 1841[, [1841; 1858[, [1858; 1874[, [1874; 1938[, [1938; 1993]; Attribute A26: [7.33; 17.575[, [17.575; 23.95[, [23.95; 28.77[, [28.77; 32.1[, [32.1; 75.38]; Attribute A27: [−0.05999994; 0.02[, [0.02; 0.135[, [0.135; 0.165[, [0.165; 0.28[, [0.28; 1]. The domains of the three remaining attributes, A17, A24 and A25, are qualitative. The values in these domains were clustered in the following way: Attribute A17 (country funds available in U.S.): values (yes, no) were considered as two distinct values; Attribute A24 (Standard and Poor’s long-term foreign-currency credit rating) and Attribute A25 (Moody’s long-term foreign-currency credit rating): since they represent qualitativeordinal and non-quantitative evaluations, the domain of these attributes was divided into clusters of similar evaluations: – with respect to A24 the clusters are: 1) {AAA}, 2) {AA, AA+, AA-}, 3) {A, A+, A-}, 4) {BBB, BBB+, BBB-}, 5) {BB, BB+, BB-}, 6) {B, B+, B-}, – with respect to A25 the clusters are: 1) {AAA, Aaa}, 2) {Aa1, Aa2, Aa3}, 3) {A1, A3, A+, A-}, 4) {Baa1, Baa3}, 5) {Ba1,Ba2,Ba3}, 6) {B1, B2}.
4 Case study results via the rough set methodology The following results were obtained using the rough set approach handling missing values: The first result of the rough set analysis is a discovery that the information table is consistent, in the sense that the quality of the approximation is equal to one. This means that the cumulative lower and upper approximations of the classes of risk are equal. The second discovery is that the core of attributes is empty, which means that no attribute is indispensable for the approximation. The empty core is the result of empty intersection of reducts of attributes that ensure the quality of approximation equal to one. To select the best reduct composed of most discriminating attributes, the following procedure was used. A single attribute characterised by the highest quality of sorting was augmented by one of the remaining attributes. The resulting pair that gave the highest quality of sorting was chosen. Then, the chosen pair was augmented by one of the remaining attributes and the resulting triple that gave the highest quality of sorting was selected, and so on, until the quality remained equal to one. Finally, it was verified that none of the attributes included in the subsets of attributes that have simultaneously attained the quality equal to one is superfluous (this would be the case when removing this attribute, the remaining subset has still quality equal to one). Since there was
Ann Oper Res (2011) 185: 105–138
113
Table 1 Procedure for selection of the best reduct in the case study Attribute
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10 A11 A12 A13 A14
Quality γP (Y )
0
0
0
.04
0
0
0
.02
0
0
Attribute
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27
Quality γP (Y )
.04
.09
0
0
0
0
0
0
0
0
0
0
0
0
.04
0
A16 +
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10 A11 A12 A13
Quality γP (Y )
.33
.15
.15
.19
.15
.12
.12
.12
.19
.15
A16 +
A14 A15 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27
Quality γP (Y )
.19
A1, A16 +
A2
A3
Quality γP (Y )
.35
.63
A1, A16 +
A15 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27
Quality γP (Y )
.63
.19
.46
.10 A4
A5
A6
A7
A8
A9
A10 A11 A12 A13 A14
.67
.750 .60
.50
.63
.67
.67
.54
.63
.44
.10
.63
.15
.69
.12
.77
.85
.13
.56
.62
.17
.58
A3
A4
A5
A6
A7
A8
A9
A10 A11 A12 A13
.88
.90
.94
.88
.92
.88
.96
.92
.88
.90
.92
.13
.67
.60
A1, A16, A25+ A2 Quality γP (Y )
.23
.12
.44
.10
.13
.10
.50
.15
.12
0
.90
A1, A16, A25+ A14 A15 A17 A18 A19 A20 A21 A22 A23 A24 A26 A27 Quality γP (Y )
1
.88
.96
.85
.88
.88
.92
.87
.96
.87
.88
.92
no superfluous attribute, the subsets of attributes that have simultaneously attained the quality equal to one were suggested as the best reduct. The steps of this procedure are shown in Table 1. The best reduct obtained is the quadruple {A1, A14, A16, A25}, where: – – – –
A1 means 3-yr compound annual total return with respect to U.S. dollar, A14 means turnover, A16 means total public ADRs or ordinary shares available in U.S., A25 means Moody’s long-term foreign-currency credit rating.
The order of selection of attributes to the best reduct suggests the order of their importance: A16, A1, A25, A14. The composition of the reduct is not surprising. The attributes from the reduct seem to represent well the points of view that an American investor is likely to have with respect to foreign markets. The third finding is that the set of all certain and minimal decision rules induced from the information table is reduced to attributes contained in the best reduct {A1, A14, A16, and A25} explained above. We are using the following convention in the syntax of the rules: – f (x, Ai) = k means an elementary condition: “with respect to attribute Ai, country x has a score belonging to the k-th sub-interval (k-th cluster for attribute Ai)”, – x → j means that country x is assigned to the j -th class of risk, – within parentheses are the countries supporting the decision rule; more precisely, in italics are countries making the basis for the decision rule, i.e. satisfying all the conditions in the antecedent of the rule (without missing values), while in regular style are countries supporting the rule without being a basis for it, because they have a missing value on an attribute present in the antecedent of the rule; for example, in rule 36, “if f (x, A1) = 1 and f (x, A25) = 4, then x → 4”, Czech Republic and India are bases for this rule because
114
Ann Oper Res (2011) 185: 105–138
both of them have value 1 on attribute A1 and value 4 on attribute A25, while Sri Lanka is only supporting the rule because it has value 1 on attribute A1 but a missing value on attribute A25). We found the following 68 rules using only attributes from best reduct {A1, A14, A16, and A25}: 1) if f (x, A1) = 5 and f (x, A14) = 4, then x → 1 (Sweden) 2) if f (x, A1) = 3 and f (x, A14) = 2, then x → 2 (Singapore) 3) if f (x, A1) = 3 and f (x, A14) = 1, then x → 3 (Argentina, Chile, South Africa) 4) if f (x, A1) = 3 and f (x, A14) = 4, then x → 3 (Malaysia) 5) if f (x, A1) = 1 and f (x, A14) = 3, then x → 3 (Mexico, Thailand) 6) if f (x, A1) = 3 and f (x, A14) = 3, then x → 3 (Philippines, Portugal) 7) if f (x, A1) = 4 and f (x, A14) = 5, then x → 4 (China, Taiwan) 8) if f (x, A1) = 1 and f (x, A14) = 1, then x → 4 (Sri Lanka) 9) if f (x, A1) = 5 and f (x, A14) = 1, then x → 5 (Egypt, Morocco, Nigeria, Russia, Zimbabwe) 10) if f (x, A1) = 5 and f (x, A14) = 5, then x → 5 (Turkey) 11) if f (x, A16) = 5, then x → 1 (U.S., Australia, Canada, U.K.) 12) if f (x, A16) = 2, then x → 1 (Denmark, New Zealand) 13) if f (x, A1) = 2 and f (x, A16) = 4, then x → 2 (Japan) 14) if f (x, A1) = 3 and f (x, A16) = 4, then x → 2 (Singapore) 15) if f (x, A1) = 3 and f (x, A16) = 3, then x → 3 (Argentina, Chile, Malaysia, Philippines, Portugal) 16) if f (x, A1) = 3 and f (x, A16) = 1, then x → 3 (South Africa) 17) if f (x, A1) = 5 and f (x, A16) = 1, then x → 5 (Egypt, Morocco, Nigeria, Zimbabwe) 18) if f (x, A14) = 2 and f (x, A16) = 1, then x → 2 (Japan) 19) if f (x, A14) = 3 and f (x, A16) = 4, then x → 2 (Hong Kong, Japan) 20) if f (x, A14) = 2 and f (x, A16) = 4, then x → 2 (Singapore) 21) if f (x, A14) = 3 and f (x, A16) = 1, then x → 3 (Greece) 22) if f (x, A1) = 2 and f (x, A14) = 4 and f (x, A16) = 3, then x → 2 (Austria)
Ann Oper Res (2011) 185: 105–138
115
23) if f (x, A1) = 4 and f (x, A14) = 3 and f (x, A16) = 3, then x → 2 (Finland) 24) if f (x, A1) = 2 and f (x, A14) = 1 and f (x, A16) = 3, then x → 4 (Colombia, Israel, Venezuela) 25) if f (x, A1) = 2 and f (x, A14) = 4 and f (x, A16) = 1, then x → 4 (Poland) 26) if f (x, A1) = 2 and f (x, A14) = 1 and f (x, A16) = 1, then x → 5 (Jordan) 27) if f (x, A1) = 4 and f (x, A25) = 1, then x → 1 (U.S., France, Germany, Netherlands, Switzerland, U.K.) 28) if f (x, A1) = 2 and f (x, A25) = 1, then x → 2 (Austria, Japan) 29) if f (x, A1) = 2 and f (x, A25) = 2, then x → 2 (Italy) 30) if f (x, A1) = 3 and f (x, A25) = 6, then x → 3 (Argentina) 31) if f (x, A1) = 3 and f (x, A25) = 4, then x → 3 (Chile, South Africa) 32) if f (x, A1) = 4 and f (x, A25) = 4, then x → 3 (Greece) 33) if f (x, A1) = 3 and f (x, A25) = 3, then x → 3 (Malaysia) 34) if f (x, A1) = 3 and f (x, A25) = 5, then x → 3 (Philippines) 35) if f (x, A1) = 2 and f (x, A25) = 4, then x → 4 (Colombia, Hungary, Indonesia, Poland) 36) if f (x, A1) = 1 and f (x, A25) = 4, then x → 4 (Czech Republic, India, Sri Lanka) 37) if f (x, A1) = 2 and f (x, A25) = 3, then x → 4 (Israel) 38) if f (x, A1) = 5 and f (x, A25) = 5, then x → 5 (Egypt, Morocco, Nigeria, Russia, Zimbabwe) 39) if f (x, A1) = 5 and f (x, A25) = 6, then x → 5 (Morocco, Nigeria, Turkey, Zimbabwe) 40) if f (x, A1) = 4 and f (x, A25) = 3, then x → 3 (Korea, Malaysia) 41) if f (x, A14) = 4 and f (x, A25) = 5, then x → 3 (Mexico, Philippine) 42) if f (x, A14) = 5 and f (x, A25) = 3, then x → 4 (China) 43) if f (x, A14) = 4 and f (x, A25) = 4, then x → 4 (Czech Republic, India, Poland) 44) if f (x, A14) = 5 and f (x, A25) = 2, then x → 4 (Taiwan) 45) if f (x, A14) = 2 and f (x, A25) = 6, then x → 5 (Peru) 46) if f (x, A14) = 5 and f (x, A25) = 6, then x → 5 (Turkey) 47) if f (x, A1) = 4 and f (x, A14) = 3 and f (x, A25) = 2, then x → 2 (Finland)
116
Ann Oper Res (2011) 185: 105–138
48) if f (x, A1) = 4 and f (x, A14) = 3 and f (x, A25) = 3, then x → 2 (Hong Kong) 49) if f (x, A1) = 4 and f (x, A14) = 4 and f (x, A25) = 6, then x → 3 (Brazil) 50) if f (x, A1) = 1 and f (x, A14) = 4 and f (x, A25) = 6, then x → 5 (Pakistan) 51) if f (x, A16) = 4 and f (x, A25) = 3, then x → 2 (Hong Kong) 52) if f (x, A16) = 4 and f (x, A25) = 2, then x → 2 (Singapore) 53) if f (x, A1) = 5 and f (x, A16) = 3 and f (x, A25) = 2, then x → 1 (Sweden) 54) if f (x, A1) = 4 and f (x, A16) = 1 and f (x, A25) = 2, then x → 2 (Belgium, Spain) 55) if f (x, A1) = 1 and f (x, A16) = 3 and f (x, A25) = 3, then x → 3 (Korea, Thailand) 56) if f (x, A1) = 1 and f (x, A16) = 3 and f (x, A25) = 5, then x → 3 (Mexico) 57) if f (x, A1) = 4 and f (x, A16) = 3 and f (x, A25) = 3, then x → 4 (China) 58) if f (x, A1) = 2 and f (x, A16) = 3 and f (x, A25) = 5, then x → 4 (Venezuela) 59) if f (x, A1) = 2 and f (x, A16) = 1 and f (x, A25) = 5, then x → 5 (Jordan) 60) if f (x, A14) = 2 and f (x, A16) = 3 and f (x, A25) = 2, then x → 1 (Ireland) 61) if f (x, A14) = 4 and f (x, A16) = 1 and f (x, A25) = 2, then x → 2 (Spain) 62) if f (x, A14) = 1 and f (x, A16) = 3 and f (x, A25) = 6, then x → 3 (Argentina) 63) if f (x, A14) = 4 and f (x, A16) = 3 and f (x, A25) = 6, then x → 3 (Brazil) 64) if f (x, A14) = 3 and f (x, A16) = 3 and f (x, A25) = 3, then x → 3 (Thailand) 65) if f (x, A14) = 3 and f (x, A16) = 3 and f (x, A25) = 4, then x → 4 (Hungary, Indonesia) 66) if f (x, A14) = 1 and f (x, A16) = 3 and f (x, A25) = 3, then x → 4 (Israel) 67) if f (x, A14) = 4 and f (x, A16) = 1 and f (x, A25) = 6, then x → 5 (Pakistan) 68) if f (x, A1) = 4 and f (x, A14) = 4 and f (x, A16) = 3 and f (x, A25) = 2, then x→2 (Sweden) The fourth result is the identification of a minimal set of certain and minimal decision rules that cover all the countries. From the rules 1)–68) several minimal sets of decision rules can be obtained: one of them is composed of the following 27 rules: 1), 3), 5), 7), 9), 11), 12), 14), 15), 19), 21), 23), 24), 26), 27), 28), 29), 35), 36), 39), 45), 49), 50), 54), 55), 60), 68).
Ann Oper Res (2011) 185: 105–138
117
Furthermore, using the LERS induction algorithm by Grzymala-Busse (1992), properly modified to take into account missing values, the following, more synthetic minimal set of certain and minimal decision rules has been induced from the non-reduced information table containing the whole set of condition attributes: 1a) if f (x, A13) = 5 and f (x, A26) = 1, then x → 1 (U.S., Australia, Canada, France, Germany, Netherlands, Switzerland, U.K.) 2a) if f (x, A14) = 4 and f (x, A19) = 4 and f (x, A26) = 1, then x → 1 (U.S., Australia, Canada, Denmark, France, Germany, Netherlands, Sweden, Switzerland) 3a) if f (x, A6) = 4 and f (x, A11) = 1 and f (x, A21) = 3 and f (x, A22) = 2 and f (x, A26) = 1, then x → 1 (U.S., Denmark, France, New Zealand, Sweden, Switzerland) 4a) if f (x, A11) = 1 and f (x, A20) = 1 and f (x, A22) = 2, then x → 1 (U.S., Australia, Canada, France, Germany, Ireland, Netherlands) 5a) if f (x, A6) = 3 and f (x, A11) = 1 and f (x, A21) = 3 and f (x, A22) = 2, then x→2 (Belgium, Finland, Japan, Norway, Spain) 6a) if f (x, A13) = 4 and f (x, A20) = 4 and f (x, A22) = 2 and f (x, A25) = 2, then x→2 (Finland, Italy, Norway, Singapore, Spain) 7a) if f (x, A14) = 3 and f (x, A18) = 4, then x → 2 (Hong Kong, Italy, Japan) 8a) if f (x, A1) = 2 and f (x, A21) = 3, then x → 2 (Austria, Italy, Japan) 9a) if f (x, A4) = 3 and f (x, A15) = 3 and f (x, A17) = “yes” and f (x, A20) = 4, then x→3 (Argentina, Brazil, Chile, Mexico, South Africa, Thailand) 10a) if f (x, A8) = 2 and f (x, A15) = 3 and f (x, A22) = 2, then x → 3 (Argentina, Brazil, Chile, Greece, Malaysia, Mexico, Portugal) 11a) if f (x, A4) = 3 and f (x, A15) = 3 and f (x, A17) = “yes” and f (x, A18) = 3, then x→3 (Argentina, Chile, Mexico, Philippines, Thailand) 12a) if f (x, A4) = 3 and f (x, A13) = 4 and f (x, A17) = “yes” and f (x, A20) = 4, then x→3 (Argentina, Brazil, Chile, Korea, Mexico, South Africa, Thailand) 13a) if f (x, A1) = 2 and f (x, A16) = 3 and f (x, A18) = 1, then x → 4 (Colombia, Hungary, Indonesia, Israel, Venezuela) 14a) if f (x, A8) = 2 and f (x, A23) = 5, then x → 4 (Czech Republic, Hungary, Taiwan) 15a) if f (x, A9) = 4 and f (x, A14) = 1, then x → 4 (Colombia, Sri Lanka) 16a) if f (x, A1) = 2 and f (x, A11) = 5, then x → 4 (Colombia, Hungary, Poland, Venezuela) 17a) if f (x, A5) = 4 and f (x, A16) = 3 and f (x, A25) = 4, then x → 4 (Colombia, Hungary, India) 18a) if f (x, A16) = 3 and f (x, A19) = 1 and f (x, A24) = 4, then x → 4 (China, Colombia, Hungary)
118
Ann Oper Res (2011) 185: 105–138
19a) if f (x, A8) = 1 and f (x, A27) = 2, then x → 5 (Egypt, Jordan, Morocco, Pakistan, Peru, Russia, Turkey, Zimbabwe) 20a) if f (x, A14) = 1 and f (x, A16) = 1 and f (x, A22) = 2, then x → 5 (Egypt, Jordan, Morocco, Nigeria) The above set of 20 rules is the fifth result. We say that the larger the set of countries supporting a rule, the greater its strength. Strong rules represent the most striking schemes of the rating policy of expert international investors. Each one of the above decision rules admits a simple verbal interpretation, as shown for the strongest rules from each class of risk: 1a) if market capitalization is between 309,583.5 and 8,484,433 millions of U.S. $ and volatility is between 7.33 and 17.575, then the country’s risk is of level 1; 5a) if projected earnings are between 11.4% and 12.5% and projected inflation rate is between 1% and 3.15% and withholding tax on dividends for U.S.-based investors is between 12.75% and 15.75% and settlement lag is between 2 days and 5 days, then the country’s risk is of level 2; 9a) if forward price/earnings ratio is between 10.8 and 13.95 and total number of listed companies is between 143 and 667 and country funds are available in U.S. and operational costs are between 56.5 and 83, then the country’s risk is of level 3; 13a) if 3-yr compound annual total return (with respect to U.S. dollar) is between −10.29% and 6.74% and total number of public ADRs or ordinary shares available in U.S. is between 9 and 106 and settlement efficiency is between −2 and 67.5, then the country’s risk is of level 4; 19a) if GNP per capita is between 260 U.S. $ and 3090 U.S. $ and correlation with U.S. is between 0.02 and 0.135, then the country’s risk is of level 5;
(this rule is supported by U.S., Australia, Canada, France, Germany, Netherlands, Switzerland, U.K.) (this rule is supported by Belgium, Finland, Japan, Norway, Spain)
(this rule is supported by Argentina, Brazil, Chile, Mexico, South Africa, Thailand)
(this rule is supported by Colombia, Hungary, Indonesia, Israel, Venezuela)
(this rule is supported by Egypt, Jordan, Morocco, Pakistan, Peru, Russia, Turkey, Zimbabwe)
It is worth stressing that all decision rules induced from the incomplete information table, using the extended rough set approach (Greco et al. 2000b), are robust in the sense that each decision rule is supported by at least one country matching exactly (with non-missing values) all elementary conditions of the rule. These countries are marked above in italics. It can be observed that the above minimal set of decision rules is composed of 20 rules (1a to 20a) and uses 22 different attributes, but only 60 elementary conditions, i.e. only 4.7% of descriptors from the original information data table. Furthermore, five attributes, in the presence of the others, were not used by any rule: Most surprisingly the price/earning ratio (A3), but also dividend yield (A7), 3-yr compound annual total return in local currency (A2),
Ann Oper Res (2011) 185: 105–138
119
Table 2 Minimal number of attributes necessary to classify a country Class 1
Class 2
USA Australia Canada France
Class 3
Class 4
Class 5
Austria Hong Kong
Czech Republic
Egypt, Jordan 2
Italy Japan
Colombia
Morocco
Hungary Poland
Pakistan Peru
Germany Netherlands
# of attributes
Sri Lanka Taiwan Russia Turkey
Switzerland U.K.
Venezuela
Zimbabwe
Denmark Sweden
Argentina Brazil
China India
Ireland
Chile Greece
Nigeria
3
Malaysia Mexico Portugal Belgium
Korea Philippines Indonesia Israel
Finland Norway
South Africa
Singapore Spain
Thailand
4
New Zealand
5
Table 3 Number of countries using each attribute (strength) Attribute
1 4 5 6
8
9 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Total
Class 1
0 0 0
6
0 0 11
8
9
0
0 0
0
9
7
6 11 0
0
0
11 0
78
Class 2
3 0 0
5
0 0
5
3
0
0 0
3
0
5
7
5 0
0
5
0 0
46
Class 3
0 8 0
0
7 0
0
7
0 10
0 8
5
0
7
0
7 0
0
0
0 0
59
Class 4
6 0 3
0
3 2
4
0
2
0
7 0
5
3
0
0
0 3
3
3
0 0
44
Class 5
0 0 0
0
8 0
0
0
4
0
4 0
0
0
0
0
4 0
0
0
0 8
28
13 12 19 13 27 3
3
8
11 8
255
5
All classes 9 8 3 11 18 2 20 20 18 10 11 8 Note: The most significant attribute in each class is in bold
short term interest rate (A12) and projected GDP growth (A10). We also observed that the first three lower risk classes are fully explained by 4 rules each; the fourth class needed 6 rules; and the last riskiest class only 2 rules. Let us now examine the attributes employed by the decision rules. Table 2 summarizes the minimum number of different attributes required to classify each country in the corresponding risk class using the above minimal set of rules. In general the smaller the number of attributes required to classify a country, the easier is its classification. In fact, looking for rules classifying a country using a minimal set of conditions is concordant with the famous Occam’s razor principle stating that explanation of any phenomenon should make as few assumptions as possible (“entities must not be multiplied beyond necessity”). An interesting question is whether riskier countries need fewer attributes in the rough set rules. Observe that the safe and developed countries (class 1 and 2) span the entire range of attributes used. In particular, New Zealand is the only country requiring a rule with five conditions. This is due to similarity between New Zealand (class 1) and other countries from class 2, such as Finland, in some important aspects, which requires a greater number of conditions for distinction of these countries; this point is explained further later in this section. It is interesting to observe that classification of the riskiest countries (class 5) does not require as many attributes: remark that rule 19a) classifying all riskiest countries but
120
Ann Oper Res (2011) 185: 105–138
Table 4 Number of times each attribute was fired (frequency) Attribute
1
4
5 6
8
9 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Total
Class 1
0
0 0
6
0 0 13
8
9
0
0
0
0
9
7
6 13 0 0 0 23 0
94
Class 2
3
0 0
5
0 0
5
5
3
0
0
0
3
0
5
8
5 0 0 5
0 0
47
Class 3
0 18 0
0
7 0
0
7
0 18
0 18
5
0 13
0
7 0 0 0
0 0
93
Class 4
9
0 3
0
3 2
4
0
2
0 11
0
5
3
0
0
0 3 3 3
0 0
51
Class 5
0
0 0
0
8 0
0
0
4
0
0
0
0
0
0
4 0 0 0
0 8
28
4
All classes 12 18 3 11 18 2 22 20 18 18 15 18 13 12 25 14 29 3 3 8 23 8 313 Note: The most frequently used attribute in each class is in bold
Nigeria is based on only two conditions (GNP per capita is between 260 U.S. $ and 3090 U.S. $ and correlation with U.S. is between 0.02 and 0.135) which are quite convincing. Table 3 shows the total number of countries in each risk class and overall using each attribute, thus revealing the attribute strength. Given that some attributes are repeated in different rules, Table 4 indicates the frequency, i.e. number of times each attribute was used (fired) within each class. The bold numbers in these two tables identify the most significant or often used attributes within each class and overall. Naturally, the most important attributes vary by class, as indicated in Tables 3 and 4. Observing the minimal set of rules, we can see that there are some countries which are bases of or are supporting several rules, for instance: • • • •
U.S. and France are bases of all the four rules describing class 1, Italy and Japan are bases of three over four rules describing class 2, Argentina and Mexico are bases of all the four rules describing class 3, Hungary is the basis of three of the six rules describing class 4 (and, moreover, it supports two other rules describing class 4), • Egypt, Jordan and Morocco are bases in one rule and support one other rule describing class 5. These countries can be interpreted as typical representatives of the corresponding classes. We can also see that some countries are bases of or are supporting just one rule: • • • • •
U.K., New Zealand and Ireland for class 1, Belgium, Singapore, Hong Kong and Austria for class 2, Greece, Malaysia, Philippines and Korea for class 3, Indonesia, Israel, Czech Republic, Taiwan, Sri Lanka, Poland for class 4, China, Pakistan, Peru, Russia, Turkey, Zimbabwe, Nigeria for class 5.
These countries can be interpreted as belonging to those classes for specific reasons explained by the rules they support. Observe also that some rules having a small number of conditions are supported by a large number of countries, for example: • • • •
rule 1a) has two conditions and it is supported by eight countries belonging to class 1, rule 2a) has three conditions and supports nine countries belonging to class 1, rule 10a) has three conditions and supports seven countries belonging to class 3, rule 19a) has two conditions and supports 8 countries belonging to class 5.
These rules can be interpreted as clear cut rules, i.e. they give few clear conditions for membership in a given class. Observe that we have such clear cut rules for extreme classes,
Ann Oper Res (2011) 185: 105–138
121
Table 5 Average classification accuracies in a 10-fold cross-validation experiment using the rough set approach Class
Predicted Class 1
Average 2
3
4
5
Accuracy
Actual Class 1
91.67%
2
22.22%
3
0% 22.22%
0%
0%
0% 55.56% 72.73%
0%
8.33%
0%
0%
91.67% 22.22%
18.18%
9.09%
72.73%
4
0%
0%
27.27%
54.55%
18.18%
54.55%
5
0%
0%
22.22%
22.22%
55.56%
55.56%
Overall
61.54%
Table 6 Average classification accuracies in a 10-fold cross-validation experiment using discriminant analysis (Doumpos et al. 2001) Class
Predicted Class
Average
1
2
3
4
5
Accuracy
1
44.2%
15.0%
2.5%
2
36.7%
20.0%
23.3%
0%
38.3%
44.2%
20.0%
0%
3
0%
8.3%
33.3%
41.7%
20.0%
16.7%
33.3%
4
16.7%
8.3%
41.7%
5
0%
0%
25.0%
8.3%
25.0%
35.7%
64.3%
64.3%
Actual Class
Overall
0%
37.4%
i.e. class 1 and class 5, and for the intermediate class 3. We have no such rules for classes 2 and 4—the rules for these classes are supported by a relatively small number of countries. This can be interpreted in the sense that the experts have a relatively clear general idea about assignment of countries belonging to classes 1, 3 and 5, using very general rules with few conditions covering a large set of countries, while they have more fragmented ideas about assigning countries to classes 2 and 4, for which quite specific rules covering few countries are induced. We also performed a 10-fold cross-validation experiment, obtaining the results shown in Table 5. The overall accuracy of classification is 61.54%. This relatively low overall accuracy is due to the small sample size of only 52 countries imposed by the WSJ global investment risk experts. Consequently, the obtained decision rules are good for interpretation of patterns hidden in the available data but not so good for prediction of unknown classifications. Doumpos et al. (2001) performed on the same data (with missing values estimated) a similar validation experiment with discriminant analysis and five Multiple Criteria Decision Analysis (MCDA) procedures based on mathematical programming classification models via piece-wise approximation of free-form nonlinear utility functions for preference disaggregation. As seen by comparing Tables 5 and 6, the results obtained with discriminant analysis are clearly worse (overall accuracy of 37.4%) than the above rough set validation results. Since the discriminant analysis can be considered as a reference point in classification
122
Ann Oper Res (2011) 185: 105–138
procedures, this binary comparison provides a reasonable argument in favor of the rough set methodology. The validation accuracies of the five MCDA procedures were 56.9%, 57.7%, 58.6%, 66.1% and 72.8%. The latter (a multi-group hierarchical discrimination optimization) was the only 100% accurate complete model for the entire dataset in Doumpos et al. (2001), like the rough sets here, and thus produced the best results. Hence, the rough set performance was very competitive in this case study. Moreover, Becerra-Fernandez et al. (2002) repeated this experiment with decision trees and several neural network method variations. They obtained good overall classification accuracy results with some neural net models but poor validation accuracies with the “ïf. . ., then. . .” decision tree rules. It should be noted, however, that the utility function embedded in the MCDA procedures (Doumpos et al. 2001), and the trained neural networks (Becerra-Fernandez et al. 2002), are classification models which have a limited capacity of explanation—they are “black boxes” to the non-expert decision makers. In contrast, the rough set methodology, which results in a set of decision rules as a classification model, gives an explanation of the classification decision in the natural language involving easily understood “if . . ., then. . .” terms. This permits to users—even non-experienced in statistics or data analysis—to get a clear intelligence of the classification and to interpret consciously the final result. For example, one could ask why New Zealand was assigned to class 1 and Finland was assigned to class 2, while one could expect the opposite assignment based on the small size of the New Zealand market on one hand, and the importance of international companies (e.g. Nokia) located in Finland, on the other hand. In contrast to some other methodologies that do not give any answer to such questions, the decision rules induced by rough sets can explain this assignment in the following way: New Zealand supports rule 3a), assigning countries to class 1, while Finland supports rule 5a) (apart from rule 6a), assigning countries to class 2. Conditions in the antecedent of rules 3a) and 5a) are very similar: • conditions “projected inflation rate is between 1% and 3.15%, and withholding tax on dividends for U.S.-based investors is between 12.75% and 15.75%, and settlement lag is between 2 days and 5 days” are the same for both rules, • rule 3a) includes the condition “projected earnings are between 12.5% and 16.1%”, while rule 5a) the condition “projected earnings are between 11.4% and 12.5%”, • rule 3a) includes the condition “volatility is between 7.33 and 17.575”, while no condition on volatility is present in rule 5a). Therefore, comparison of rules 3a) and 5a) suggests that the better classification of New Zealand than that of Finland is due to projected earnings and volatility. The complete comparison of New Zealand and Finland by attributes present in decision rules 3a) and 5a) is the following: • • • • •
projected earnings: New Zealand 13.2; Finland 11.7; projected inflation rate: New Zealand 1.9; Finland 1.3; withholding tax on dividend for U.S.-based investors [%]: New Zealand 15; Finland 15; settlement lag [days]: New Zealand 5; Finland 3 volatility: New Zealand 14.93; Finland 31.9.
The above confirm of the reasons of assignment of New Zealand to a better risk class than Finland, suggested by rule 3a) and rule 5a); indeed, New Zealand and Finland have comparable conditions with respect to projected inflation rate, withholding tax on dividend for U.S.-based investors and settlement lag, but New Zealand has larger projected earnings and smaller volatility than Finland. It is worth noting that so detailed comparison on crucial variables would not be possible using methodologies different from the rough set approach. Therefore, this example shows
Ann Oper Res (2011) 185: 105–138
123
the advantage of a “glass box” methodology, like the rough set approach, in comparison to a “black box methodology”, like discriminant analysis, MCDA, or neural networks. This feature can make acceptance of model results and implementation more likely by decision makers, who “would not accept a solution that they do not understand” (Zanakis et al. 1980).
5 Conclusions The rough set approach analysis of multiattribute data sets extended for missing values—a common encounter in practice—maintains all good characteristics of the classical rough set approach. It also boils down to the classical approach when there are no missing values. This approach avoids the disadvantages of statistical classification procedures (discriminant analysis and logistic regression); namely, the restrictive assumptions of equal covariance matrices and multivariate normal distribution, which are very often violated in economics and finance (Zanakis and Walter 1994). Furthermore, multivariate statistical procedures applied to country risk analysis have been criticised for their serious shortcomings related to the quality and availability of data (compounded by missing observations), definition of the dependent variable and difficulties in forecasting debt servicing (Saini and Bates 1984). The country risk ratings, provided often by Euromoney and by Institutional Investor magazines, are based on arbitrary or secretly weighted indicator averages. The rough set approach has several advantages in comparison with other methods of data analysis because: 1) it analyzes only facts hidden in data without requiring additional information, like probability, grade of membership or importance of attribute, 2) it deals naturally with qualitative (categorical) attributes without transforming them into binary attributes, 3) it can discover inconsistencies in the data and take them into account in final conclusions, while other methods have a tendency of eliminating inconsistencies prior to the analysis, 4) it supplies useful elements of knowledge contained in the data set, like reducts and core of attributes, 5) it prepares the ground for induction of decision rules expressing knowledge contained in the data set in a natural language of “if. . ., then. . .” statements, understood easily by non experts and decision makers. The extended rough set approach enables the analysis of data sets with missing values. The case of missing values is very often found in practice and very few methods can deal satisfactorily with this problem; most methods require external data imputation for missing values before using a dataset, like the MCDA procedure in Grzymala-Busse (1992). The way of handling the missing values in the rough set approach seems faithful with respect to available data, because the induced decision rules are robust in the sense of being found on objects existing in the data set and not on hypothetical objects. Furthermore, this approach classified correctly all countries, as compared to the usual 70–90% in various statistical studies of country financial risk (Fayyad and Irani 1992). However, this “fitness” result should be interpreted with caution and not generalized, because of the small sample size of 52 countries classified in the WSJ article (which of course we cannot augment). Let us also remark that such a small sample size would not make meaningful any generalization of results from statistical models either. That’s why we consider this rough sets real-case study as explanatory rather than predictive; it focuses on identifying important factors affecting multi-country investment risks and thus extracting the knowledge/expertise of international investment managers in making such global assessments.
124
Ann Oper Res (2011) 185: 105–138
The usefulness of the rough set approach has been demonstrated on a real data set concerning classification of country financial risk by international experts of The Wall Street Journal. We found the information considered by the Journal experts to be consistent in the sense that the measures of quality and accuracy of the approximation are both equal to one. In this case study we obtained 100% accuracy in classifying all countries as stated by the WSJ experts, which may support further the notion of consistency. However, the 27 attributes describing the countries are not all necessary to explain the classification into 5 levels of risk considered by the journal. We found the best reduced subset of attributes, called “reduct”, which is enough to explain the classification with the same quality as the whole set of 27 attributes, i.e. without inconsistency. It is composed of 4 attributes, in order of importance: A16 (total public ADRs or ordinary shares available in U.S.), A1 (3-yr compound annual total return with respect to U.S. dollar), A25 (Moody’s long-term foreign-currency credit rating, A14 (turnover), Furthermore, none of the 27 attributes is indispensable for the explanation (the core is empty). The decision rules induced from rough approximations of the classes of risk give a concise and clear explanation of the classification policy adopted by experienced international investors asked by the Wall Street Journal. Such decision rules could be employed to classify other countries, not included in the Wall Street Journal list. We want to stress, however, that our main interest in this case study was to represent the explicit or implicit knowledge or policy of financial experts in term of rules and not to construct a classifier for countries. The latter would require a much larger sample, taking into account the number of classes and the number of possible values of each attribute. We had no choice in the size of the existing sample; only 52 countries partitioned into 5 classes were examined by WSJ. Thus, our study should be viewed as explanatory and not predictive. Our main objective was knowledge discovery via decision rule mining. On the basis of the synthetic minimal set of decision rules induced using the LERS induction algorithm, the following remarks regarding the importance of attributes are interesting. For less risky financial markets (group 1), the most significant attributes are mainly lowest volatility (A26), lowest projected inflation rate (A11), and quick settlement (A22). Not too high correlation with US (A27) and smallest GNP per capita (A8) appear to be significant determinants of riskiest markets (group 5). The most significant attribute for the entire set of all countries is the settlement lag (A22)—an unexpected finding—as documented in both Tables 3 and 4. Five attributes were not useful since in the presence of others, they were not used by any decision rule: A2: 3-yr compound annual total return in local currency, A3: price/earning ratio, A7: dividend yield, A10: projected GDP growth, and A12: short term interest rate. Two questions of practical interest can be answered from the rough set results in this case study: Is there a relationship between the number of missing values and the country’s risk assigned by the experts and whether or how the number of missing values of an attribute affects its usefulness? Usefulness can be measured by its strength (calculated as the number of countries using each attribute in the minimal set of rules—presented in Table 3), and by the number of times each attribute appeared in the decision rules (calculated as the number of times each attribute was fired in the minimal set of rule—presented in Table 4). The following significant correlations at α = 0.05 were observed: Investment riskier countries tend to have more missing data, r = 0.68 (as expected for less developed nations) and more extreme values (questionable outliers that reduce accuracies), r = 0.38, but fewer attributes employed by their rough set decision rules (as suggested by Table 4), r = −0.45. Furthermore, missing values have a marginally negative impact on the attribute strength (number of countries affected by each attribute, r = −0.27) and frequency an attribute participated in decision rules (r = −0.23). This negative correlation is concordant
Ann Oper Res (2011) 185: 105–138
125
with the intuition that attributes with fewer missing values would be more frequently used in decision rules tending to better results. Rough set methods can play an important role in managerial decision making. Very often experts give their evaluation on the basis of their intuition due to long experiences, but they have some difficulty explaining the reasons of these evaluations. In such cases, decision rules can discover this knowledge, provide useful insights to the experts, and supply explanations of the expert evaluations, thus providing solid arguments to support them. Then, the decision maker can utilize a critical review of the expert evaluations, or if the classification of a country is not properly explained by strong decision rules, to seek other experts or consider a reclassification of that country. Moreover, even if the induced decision rules are explanatory rather than predictive, they can be used to support new evaluations by experts. Let us suppose that an expert would like to revise his judgment of a country x belonging presently to class 2. Suppose also that he/she observes that a certain country has now improved and satisfies conditions of rule 1a) (market capitalization is between 309,583.5 and 8,484,433 millions of U.S. $ and volatility is between 7.33 and 17.575). Using this information, he/she can change the evaluation of this country from class 2 to class 1. Moreover, he/she can make suggestions for a future evaluation of the economy of this country based on comparison with other countries supporting rule 1a) (U.S., Australia, Canada, France, Germany, Netherlands, Switzerland, U.K.). Observe that even if argumentation of a new classification and comparison with similar countries seems to be very natural in the expert’s evaluation, this is possible when using rough set decision rules but not by other methodologies like discriminant analysis, neural networks and MCDA procedures. The predictive character of decision rules seems not so useful, however, in managerial evaluation where decisions need to be explained with sound arguments. However, a set of decision rules can help the expert to refute a suggested classification, arguing that the rule conditions are not sufficient to change the evaluation due to a new state of affairs. Alternatively, the expert can observe that the countries supporting the rule cannot be considered as references due to heterogeneous characteristics with respect to the country in question. In general, decision rules induced by rough set methodology permit a greater involvement of the expert who is not obliged to accept the result of the analysis without understanding the reasons of the suggested decisions (as it is the case with some other methodologies), and hence a greater likelihood for implementation. In conclusion, the extended rough set approach applied to the incomplete data set of The Wall Street Journal performs two crucial tasks: it models the rating policy of international experts and explains their implicit or explicit policy in comprehensible decision rules. We believe that the proposed solution to the problem of analyzing incomplete data sets has relevance to many other methodologies for data analysis, such as machine learning, incomplete census surveys, data mining and knowledge discovery. It could also be extended to other multi-criteria decision problems (like choice, ranking and sorting), which are also of great importance, but substantially different from the classification problem considered in this paper. Acknowledgement We like to thank the three anonymous reviewers and the editor for their insightful and constructive comments. The third author wishes to acknowledge financial support from the Polish Ministry of Science and Higher Education.
Appendix A: Examples illustrating basic concepts of rough set methodology The illustrative examples presented in this section will serve to explain the main concepts of rough set theory with a specific emphasis on the case of missing values. We recall first
126
Ann Oper Res (2011) 185: 105–138
the basic concepts of the rough set approach proposed by Pawlak (1982, 1991). Then, we present the rough set approach in case of missing values, developed by Greco et al. (2000b). A.1 Rough set approach without missing values Let us consider an illustrative example concerning classification of cars. The information table of the car example is presented in Table 7a. The table describes set U composed of six cars, using four condition attributes (price, mileage, size, max-speed) and one decision attribute (decision). The decision attribute makes a partition of the set of six cars into three decision classes (poor, good, excellent). Our aim is to explain the assignment of cars to the three decision class (excellent, good, poor) on the basis of the description given by four conditions attributes (price, mileage, size, max-speed). The basic element of rough set analysis is the indiscernibility relation with respect to a nonempty subset of condition attributes P ⊆ C, denoted by IP . Two cars (in general, two objects) are indiscernible if they have the same description with respect to all the considered condition attributes. For example, in Table 7a, considering set of attributes C = {price, mileage, size, max-speed}, car2 is indiscernible with car6 , denoted by car2 IC car6 , as well as, car4 is indiscernible with car5 , denoted by car4 IC car5 . Of course, each car (in general, each object) is also indiscernible with itself, i.e. cari IC cari , i = 1, . . . , 6. Observe that binary relation IP is reflexive, symmetric and transitive, and therefore it is an equivalence relation. Observe also that the equivalence relation IP changes if P ⊆ C changes. For example, if P = {price, mileage, size}, then car1 is indiscernible with car4 , and car5 , i.e. car1 IP car4 and car1 IP car5 . Notice that with respect to IP , car2 continues to be indiscernible with car6 and car4 continues to be indiscernible with car5 . In general, given two nonempty sets of attributes P , R ⊆ C, if P ⊆ R, all pairs of cars (in general, objects) indiscernible with respect to R are indiscernible also with respect to P . Given a set of attributes P ⊆ C and a car (in general, an object) in the information table, its equivalence class is the set of all the cars being indiscernible with it. For example, given P = {price, mileage, size}, the equivalence class of car1 is the following: [car1 ]P = {car1 , car4 , car5 }. Table 7a Information table of the illustrative car example Car
Price
Mileage
Size
Max-Speed
Decision
car1
high
low
full
low
good
car2
low
high
full
low
good
car3
medium
medium
compact
low
poor
car4
high
low
full
high
good
car5
high
low
full
high
excellent
car6
low
high
full
low
good
Ann Oper Res (2011) 185: 105–138
127
The equivalence classes, called also elementary sets, are information granules, being the basic bricks of rough approximation. Considering the set of attributes C = {price, mileage, size, max-speed}, we get the following equivalence classes: [car1 ]C = {car1 },
[car2 ]C = [car6 ]C = {car2 , car6 },
[car3 ]C = {car3 },
[car4 ]C = [car5 ]C = {car4 , car5 }.
Given a set of attributes P ⊆ C, the set of all equivalence classes induced by the indiscernibility relation IP , denoted by U |IP , is called the quotient set and forms a partition of the considered set of cars (in general, of the universe U ). This means that the sets being elements of the quotient set do not intersect between them and their union gives back the entire set of cars (in general, the universe U ). For example, considering the set of attributes C, the quotient set is U |IC = {{car1 }, {car2 , car6 }, {car3 }, {car4 , car5 }}, while considering the set of attributes P = {price, mileage, size}, the quotient set is U |IP = {{car1 , car4 , car5 }, {car2 , car6 }, {car3 }}. Given a set of attributes P ⊆ C, using the elementary sets (equivalence classes) induced by the indiscernibility relation IP , rough approximations of each decision classes can be defined. The lower approximation of class X, denoted by I P (X), is the set of all the cars (in general, objects) whose equivalence classes are all contained in the decision class, i.e. I P (X) = {x ∈ U : [x]P ⊆ X}, while the upper approximation, denoted by I P (X), is the set of all the cars (in general, objects) whose equivalence classes intersect with the decision class, i.e. I P (X) = {x ∈ U : [x]P ∩ X = ∅}. For example, given the set of attributes C = {price, mileage, size, max-speed} and the decision class good, the lower approximation is I C (good) = {car1 , car2 , car6 }, while the upper approximation is I C (good) = {car1 , car2 , car4 , car5 , car6 }. In simple words, in the lower approximation there are cars that “certainly belong” to the decision class good because, in the information table, all the cars having the same description belong to this decision class. Analogously, in the upper approximation there are cars that “possibly belong” to the decision class good because, in the information table, there is at least one car having the same description and belonging to this decision class. Observe that car4 belongs to the decision class good, but does not belong to its lower approximation I C (X). The reason is that car4 is indiscernible with car5 , which does not belong to decision class good. Thus, the equivalence class of car4 is not contained in the decision class good. This means that, according to the available information, even if car4 belongs to the class of good cars, it is not possible to say that a car having the same description
128
Ann Oper Res (2011) 185: 105–138
as car4 certainly belongs to the class of good cars. Observe also that car5 , which does not belong to decision class good, belongs to its upper approximation. The reason is that car5 is indiscernible with car4 , which belongs to decision class good. Thus, the equivalence class of car5 intersects with the decision class good. This means that, according to the available information, even if car5 does not belong to the class of good cars, one could say that a car having the same description as car5 possibly belongs to the class of good cars. Given a set of attributes P ⊆ C and a decision class X, the set difference between its upper and its lower approximation, BP (X) = I P (X) − I P (X), is defined as the boundary. The cars (in general, objects) belonging to the boundary are called ambiguous, because their equivalence classes intersect with more than one decision class. This means that, according to the available information, cars from the boundary, having the same description, could belong to different decision classes. Given the set of attributes C = {price, mileage, size, max-speed} and decision class good, the boundary is BC (good) = {car4 , car5 }. Of course, in general, lower approximations, upper approximations and boundaries, change if the considered set of attributes changes. For example, considering P = {price, mileage, size}, I P (good) = {car2 , car6 }, I P (good) = {car1 , car2 , car4 , car5 , car6 }, BP (good) = {car1 , car4 , car5 }. Note that car1 , which is in the lower approximation of decision class good when considering set of attributes C, is no more in the lower approximation when considering set of attributes P . In fact, the attribute max-speed permits to discern car1 from car5 , which belongs to decision class excellent. Thus, when attribute max-speed is removed from set of attributes C, car1 becomes indiscernible with car5 and leaves the lower approximation. In more general terms, using less information, i.e. removing some attributes from the considered set, the objects can be assigned to the decision classes less precisely, such that the lower approximation can get smaller, and the upper approximation and the boundary can get larger. For the sake of completeness observe that, taking into account the set of attributes C = {price, mileage, size, max-speed}, the lower approximation, the upper approximation and the boundary of classes poor and excellent are I C (poor) = {car3 }, I C (poor) = {car3 }, BC (poor) = ∅,
I C (excellent) = ∅, I C (excellent) = {car4 , car5 },
BC (excellent) = {car4 , car5 },
while, taking into account set of attributes P = {price, mileage, size}, I P (poor) = {car3 },
I P (excellent) = ∅,
Ann Oper Res (2011) 185: 105–138
I P (poor) = {car3 }, BP (poor) = ∅,
129
I P (excellent) = {car1 , car4 , car5 }, BP (excellent) = {car1 , car4 , car5 }.
Since the main aim of rough set theory applied to data analysis is to explain the information about the content of decision classes using information about description of objects by condition attributes, it is useful to define an index which quantifies the goodness of this explanation. This index is the quality of approximation specifying the percentage of cars (in general, objects) that belong to the lower approximation. More formally, given a set of attributes P ⊆ C, if D1 , . . . , Dm are the decision classes, the quality of approximation γP is expressed as m card(I P (Di )) . γP = i=1 card(U ) Therefore, the quality of approximation relative to the set of attributes C = {price, mileage, size, max-speed} is γC = 4/6 while the quality of approximation relative to the set of attributes P = {price, mileage, size} is γP = 3/6. The main aim of rough approximation in data analysis is reduction of superfluous information. Let us consider Table 7b, obtained from Table 7a by removing the attribute Price. Taking into account set of attributes R = {mileage, size, max-speed}, the lower approximation, the upper approximation and the boundary of the three decision classes are I R (good) = {car1 , car2 , car6 },
I R (poor) = {car3 },
I R (good) = {car1 , car2 , car4 , car5 , car6 },
I R (excellent) = ∅,
I R (poor) = {car3 },
I R (excellent) = {car4 , car5 }, BR (good) = {car4 , car5 },
BR (poor) = ∅,
BR (excellent) = {car4 , car5 }.
Thus, the set of attributes R = {mileage, size, max-speed} gives the same lower approximations as the entire set of attributes C, such that attribute Price can be removed without deteriorating the quality of approximation. This means that, with respect to assignment of cars to decision classes, set of attributes R = {mileage, size, max-speed} maintains the same information as the whole set of attributes C. In other terms, set R gives the same quality of approximation as set of attributes C, i.e. γR = γC = 4/6. Table 7b Information table of the car example without attribute Price Car
Mileage
Size
Max-Speed
Decision
car1
low
full
low
good
car2
high
full
low
good
car3
medium
compact
low
poor
car4
low
full
high
good
car5
low
full
high
excellent
car6
high
full
low
good
130
Ann Oper Res (2011) 185: 105–138
Table 7c Information table of the car example without attributes Price and Size Car
Mileage
Max-Speed
Decision
car1
low
low
good
car2
high
low
good
car3
medium
low
poor
car4
low
high
good
car5
low
high
excellent
car6
high
low
good
Now, let us consider Table 7c, obtained from Table 7b by removing the attribute Size (or, equivalently, by removing the attributes Price and Size from Table 7a). Taking into account set of attributes T = {mileage, max-speed}, the lower approximation, the upper approximation and the boundary of the three decision classes are I T (good) = {car1 , car2 , car6 },
I T (poor) = {car3 },
I T (good) = {car1 , car2 , car4 , car5 , car6 },
I T (excellent) = ∅,
I T (poor) = {car3 },
I T (excellent) = {car4 , car5 }, BT (good) = {car4 , car5 },
BT (poor) = ∅,
BT (excellent) = {car4 , car5 }.
Thus, the set of attributes T = {mileage, max-speed} gives the same lower approximations as the whole set of attributes C, such that attributes Price and Size can be removed without deteriorating the quality of approximation. This means that, with respect to assignment of cars to decision classes, the set of attributes T = {mileage, max-speed} maintains the same information as the entire set of attributes C. In other terms, set T gives the same quality of approximation as set of attributes C, i.e. γT = γC = 4/6. Let us now look at the lower approximation, the upper approximation and the boundary obtained by taking into account the attribute Mileage only: I {mileage} (good) = {car2 , car6 },
I {mileage} (poor) = {car3 },
I {mileage} (good) = {car1 , car2 , car4 , car5 , car6 },
I {mileage} (excellent) = ∅,
I {mileage} (poor) = {car3 },
I {mileage} (excellent) = {car1 , car4 , car5 }, B{mileage} (good) = {car1 , car4 , car5 },
B{mileage} (poor) = ∅,
B{mileage} (excellent) = {car1 , car4 , car5 }, or the attribute Max-Speed only: I {max-speed} (good) = ∅,
I {max-speed} (poor) = ∅,
I {max-speed} (excellent) = ∅,
I {max-speed} (good) = {car1 , car2 , car3 , car4 , car5 , car6 }, I {max-speed} (poor) = {car1 , car2 , car3 , car6 },
Ann Oper Res (2011) 185: 105–138
131
I {max-speed} (excellent) = {car4 , car5 }, B{max-speed} (good) = {car1 , car2 , car3 , car4 , car5 , car6 }, B{max-speed} (poor) = {car1 , car2 , car3 , car6 }, B{max-speed} (excellent) = {car4 , car5 }. Observe that {mileage} does not give the same lower approximations as the entire set of attributes, such that attributes Price, Size and Max-Speed cannot be removed without deteriorating the quality of approximation. This means that, with respect to assignment of cars to decision classes, {mileage} does not maintain the same information as the whole set of attributes C. In other terms, set {mileage} does not give the same quality of approximation as set of attributes C, i.e. γ{mileage} = 3/6 < 4/6 = γC . Analogously, {max-speed} does not give the same lower approximations as the entire set of attributes, such that attributes Price, Size and Mileage cannot be removed without deteriorating the quality of approximation. This means that, with respect to assignment of cars to decision classes, {max-speed} does not maintain the same information as the whole set of attributes C. In other terms, set {max-speed} does not give the same quality of approximation as set of attributes C, i.e. γ{max-speed} = 0 < 4/6 = γC . Therefore, set of attributes T = {mileage, max-speed} is a minimal (with respect to inclusion) set maintaining the same quality of approximation as the whole set of attributes C and, therefore, it is a reduct of C, denoted by Red1 = {mileage, max-speed}. In Table 7a, there are two other reducts, namely Red2 = {price, max-speed} and Red3 = {size, max-speed}. Observe that each one of these three reducts contains the attribute Max-Speed. This means that this attribute cannot be removed from set C without deteriorating the quality of the approximation and, therefore, it is an indispensable attribute. The set of the indispensable attributes is called the core. Given a reduct, say Red2 = {price, max-speed}, a set of decision rules can be induced from the corresponding rough approximations. The idea is that description of cars (in general, objects) belonging to the lower approximations can serve as a base for some certain rules, while description of cars belonging to the boundaries can serve as a base for some approximate rules. Considering the above reduct Red2 = {price, max-speed}, the following minimal decision rules can be induced (within parentheses there are cars supporting the corresponding rule): Rule 1) Rule 2) Rule 3) Rule 4)
if Price is low, then car is good (car2 , car6 ), if Price is medium, then car is poor (car3 ), if Price is high and Max-Speed is low, then car is good (car1 ), if Max-Speed is high, then car is good or excellent (car4 , car5 ), (with no possibility to distinguish between the two classes).
Note that Rules 1, 2 and 3 are certain because they assign cars to a given decision class without any ambiguity, while Rule 4) is approximate, because it assigns cars to two decision classes with some ambiguity. Using all the attributes from set C, the following minimal decision rules can be induced apart from the above four: Rule 5) Rule 6) Rule 7) Rule 8)
if Mileage is high, then car is good (car2 , car6 ), if Mileage is medium, then car is poor (car3 ), if Size is compact, then car is poor (car3 ), if Mileage is low and Max-Speed is low, then car is good
(car1 ),
132
Ann Oper Res (2011) 185: 105–138
Rule 9) if Size is full and Max-Speed is low, then car is good
(car1 , car2 , car6 ).
Such decision rules express information contained in the information table in a natural language using simple “if . . ., then. . .” statements, without recourse to complex models and formulas understandable by experts only. On the contrary, decision rules constitute a transparent model in which the relation between the original information and the final recommendation is clearly shown. Moreover, for each decision rule it is possible to indicate the cars (in general, objects) supporting it. For example, the user learns that Rule 1 “if Price is low, then car is good” is supported by car2 and car6 . Thus, the user can look at the information table and check the complete description of car2 and car6 which reveals that Rule 1 is true in the context of other cars present in the table. In this way, the origin of the obtained results can be clearly observed. Remark that in the rough set approach, data relative to objects are not amalgamated in the course of constructing the final result as it is the case in linear regression, neural nets, discriminant analysis and many other competitive methodologies. These features characterize the rough set approach as a glass-box methodology, while other competitive methodologies have to be classified as black-box methodologies due to the opaque nature of their final results. A.2 Rough set approach with missing values Let us consider the information table presented in Table 8. Table 8 differs from Table 7a because some values in the information table are missing. The question is: how to use the rough set approach in this situation? Several answers have been given to this question in the literature. We present a methodology proposed by Greco et al. (2000b). With respect to competitive methodologies this approach maintains the noninvasive character of rough set theory. The main aim of this approach is to induce so-called robust decision rules. A decision rule is robust when its antecedent corresponds to description of at least one car (in general, object) in the information table. For example, consider the following rule r induced from Table 8: r ≡ “if Size is full and Max-Speed is low, then car is good”. This rule is robust because there are car1 and car2 whose Size is full and Max-Speed is low. However, rule r covers also car6 whose Size is full but Max-Speed unknown. If Max-Speed of car6 would be low, then it would match perfectly rule r; anyway, if Max-Speed of car6 would have another value, this would not contradict rule r. To induce robust decision rules, one has to define properly the indiscernibility relation taking into account missing values. The driving idea of the presented approach is that comparison of two objects, y and x, is directional, so object y is compared to reference object x, and, moreover, the reference object should have no missing values on the attributes considered. Thus, given a set of attributes P ⊆ C, car y is indiscernible with car x, if x has Table 8 Information table of the illustrative car example with missing values Car
Price
Mileage
Size
Max-Speed
Decision
car1
high
low
full
low
good
car2
low
∗
full
low
good
car3
∗
∗
compact
low
poor good
car4
high
∗
full
high
car5
∗
∗
full
high
excellent
car6
low
high
full
∗
good
Ann Oper Res (2011) 185: 105–138
133
no missing values with respect to set of attributes P , and for each attribute from P , either y has the same description as x or y has a missing value. For example, consider set of attributes P = {price, size}. With respect to P , car1 can be a reference object because it has no missing values on attributes from P . Thus, any other car can be compared to car1 with respect to set of attributes P , such that • car4 is indiscernible with car1 with respect to set of attributes P , denoted as car4 IP car1 , because car4 has the same description on the attributes from P , i.e. for both of them Price is high and Size is full, • car5 is indiscernible with car1 with respect to set of attributes P , denoted as car5 IP car1 , because car5 has the same description on the attribute Size, while its value on the attribute Price is missing. Observe that the indiscernibility relation IP (P ⊆ C) thus defined is not reflexive, because x IP x does not hold if object x has some missing values on attributes from P , and it is not symmetric, because y IP x and x IP y do not hold if either x or y has some missing values on attributes from P . Relation IP (P ⊆ C) continues to be transitive, however, without reflexivity and symmetry it is no more an equivalence relation. Given a set of attributes P ⊆ C, the elementary set of an object x having no missing value on attributes from P , is the set of all the objects y being indiscernible with it by indiscernibility relation IP . For example, for P = {price, size}, the elementary set of car1 is the following: [car1 ]P = {car1 , car4 , car5 }. Observe that car4 has the same elementary set as car1 , i.e. [car4 ]P = {car1 , car4 , car5 }. The other elementary sets relative to P = {price, size} are: [car2 ]P = [car6 ]P = {car2 , car5 , car6 }. car3 and car5 do not have elementary sets with respect to set of attributes P = {price, size}, because they have missing values on these attributes. Given a set of attributes P ⊆ C, using the elementary sets induced by the indiscernibility relation IP , rough approximations of the decision classes can be defined. The lower approximation of class X, denoted by I P (X), is the set of all the cars (in general, objects) whose elementary sets are all contained in the decision class, i.e. I P (X) = {x ∈ U : [x]P ⊆ X}, while the upper approximation, denoted by I P (X), is the set of all the cars (in general, objects) whose elementary sets intersect with the decision class X, i.e. I P (X) = {x ∈ U : [x]P ∩ X = ∅}.
134
Ann Oper Res (2011) 185: 105–138
For example, given the set of attributes P = {price, size}, and decision class good, the lower approximation is I P (good) = ∅, while the upper approximation is I P (good) = {car1 , car2 , car4 , car6 }, such that the boundary is BP (good) = {car1 , car2 , car4 , car6 }. For the sake of completeness, observe that the lower approximation, the upper approximation and the boundary of decision classes excellent and poor with respect to set of attributes P = {price, size} are the following: I P (excellent) = ∅,
I P (excellent) = {car1 , car2 , car4 , car6 },
BP (excellent) = {car1 , car2 , car4 , car6 }, I P (poor) = ∅,
I P (poor) = ∅,
BP (poor) = ∅.
Let us consider now the lower approximation, the upper approximation and the boundary of all three decision classes with respect to the sets of attributes R = {size, max-speed} and T = {price, size, max-speed}: I R (good) = {car1 , car2 },
I R (good) = {car1 , car2 , car4 , car5 },
BR (good) = {car4 , car5 }, I R (excellent) = ∅,
I R (excellent) = {car4 , car5 },
I R (poor) = {car3 }, I T (good) = {car1 , car2 }, I T (excellent) = ∅,
I R (poor) = {car3 },
BR (excellent) = {car4 , car5 }, BR (poor) = ∅,
I T (good) = {car1 , car2 , car4 }, I T (excellent) = {car4 },
I T (poor) = ∅,
I T (poor) = ∅,
BT (good) = {car4 },
BT (excellent) = {car4 }, BT (poor) = ∅.
Note that car3 belongs to the lower approximation of decision class poor with respect to set of attributes R = {size, max-speed}, but it does not belong to the lower approximation of the same decision class with respect to set of attributes T = {price, size, max-speed}. Since R ⊂ T , this is surprising because, as explained for rough approximations in case of not missing values, passing from one set of attributes (T ) to its proper subset (R), the objects can be assigned to the decision classes less precisely, such that the lower approximation is reduced (or, more precisely, does not grow). Since the main aim of rough set analysis is a non-invasive reduction of information, this property of lower approximation should also be maintained in case of missing values. Thus, in order to restore the property that reduction of the set of attributes does not increase lower approximations in case of missing values, it is necessary to introduce another definition of rough approximations adapted to this case.
Ann Oper Res (2011) 185: 105–138
135
This is achieved by considering a different indiscernibility relation IP∗ . Given a set of attributes P ⊆ C and a referent object x whose values on attributes from P are not all missing, an object y is indiscernible with x by indiscernibility relation IP∗ , i.e. y IP∗ x, if for each attribute from P , either y has the same description as x, or x has a missing value, or y has a missing value. Notice that y IP x implies y IP∗ x, while the opposite is not true. For example, let us consider again set of attributes P = {price, size} and car1 . Observe that all cars cari , i = 1, . . . , 6, can be now reference objects because each of them has at least one non-missing value on attributes from P (e.g. each car has a value on attribute Size). When comparing all cars to car1 with respect to set of attributes P , we get car4 IP∗ car1 , and car5 IP∗ car1 , because car4 IP car1 , and car5 IP car1 . Also car1 IP∗ car4 , because car4 IP car1 . Moreover, observe, that car1 IP∗ car5 , even if it is not true that car1 IP car5 . Note that, in general, IP∗ is symmetric, but neither reflexive nor transitive. Analogously to the previous definition, given a set of attributes P ⊆ C, the elementary set of an object x having at least one non-missing value on attributes from P , is the set of all the objects y being indiscernible with it by indiscernibility relation IP∗ . For example, for P = {price, size}, the elementary set of car5 is the following: [car5 ]∗P = {car1 , car2 , car4 , car5 , car6 }. The other elementary sets relative to P = {price, size} are: [car1 ]∗P = [car4 ]∗P = {car1 , car4 , car5 }, [car2 ]∗P = [car6 ]∗P = {car2 , car5 , car6 },
[car3 ]∗P = {car3 }.
Remark that, considering the set of attributes P = {price, mileage}, car3 and car5 do not have their own elementary sets because they have all missing values on these attributes. Given a set of attributes P ⊆ C, using the elementary sets induced by the indiscernibility relation IP∗ , rough approximations of the three decision classes can be defined in the usual way, i.e. the lower approximation of class X, denoted by I ∗P (X), is the set of all the cars (in general, objects) whose elementary sets are all contained in the decision class, i.e. I ∗P (X) = {x ∈ U : [x]∗P ⊆ X}, ∗
while the upper approximation, denoted by I P (X), is the set of all the cars (in general, objects) whose elementary sets intersect with the decision class, i.e. ∗
I P (X) = {x ∈ U : [x]∗P ∩ X = ∅}. Using this approach, and considering the entire set of attributes C, we obtain the following lower approximations, upper approximations and boundaries of the decision classes: ∗
I ∗C (excellent) = ∅,
I C (excellent) = {car4 , car5 , car6 },
BC∗ (excellent) = {car4 , car5 , car6 }, ∗
I ∗C (good) = {car1 , car2 },
I C (good) = {car1 , car2 , car4 , car5 , car6 },
BC∗ (good) = {car4 , car5 , car6 }, I ∗C (poor) = {car3 },
∗
I C (poor) = {car3 },
BC∗ (poor) = ∅.
136
Ann Oper Res (2011) 185: 105–138
Greco et al. (2000b) proved that for any P ⊆ C I ∗P (X) = I R (X), R⊆P
and, consequently, for any P1 ⊆ P2 ⊆ C, the inclusion property holds I ∗P1 (X) ⊆ I ∗P2 (X). Indeed, with respect to sets of attributes R = {size, max-speed} and T = {price, size, maxspeed}, we have I ∗R (poor) = {car3 } and I ∗T (poor) = {car3 }, such that the inclusion property is satisfied. The inclusion property permits to calculate reducts and core for information tables with missing values. In Sect. 2, we defined the reduct and the core using the concept of the quality of approximation. An equivalent definition of the reduct and the core involves lower approximations. More precisely, reducts are minimal sets P ⊆ C of attributes (with respect to inclusion) which give the same lower approximations as the whole set of attributes C for all the decision classes D1 , . . . , Dm , i.e. I ∗P (Dj ) = I ∗C (Dj ), j = 1, . . . , m; and the core is the set of all indispensable attributes, i.e. the set of all the attributes which cannot be removed without eliminating some car (in general, object) from a lower approximation. In Table 8, there is only one reduct which is also the core: Red∗ = Core∗ = {size, max-speed}. Due to the inclusion property, also the quality of approximation stays meaningful. Formally, given a set of attributes P ⊆ C, if D1 , . . . , Dm are the decision classes, the quality of approximation γP is expressed as m card(I ∗P (Di )) . γP = i=1 card(U ) For Table 8, the quality of approximation relative to C, as well as to P = {size, max-speed}, is γC = γP = 3/6. The following minimal certain rules can be induced from lower approximations of the decision classes (within parentheses there are cars supporting the corresponding rule): Rule 1) if Size is compact, then car is poor (car3 ), Rule 2) if Size is full and Max-Speed is low, then car is good
(car1 , car2 ).
From the boundaries of the decision classes, the following minimal approximate rule can be induced: Rule 3) if Max-Speed is high, then car is good or excellent
(car4 , car5 , car6 ).
All the above rules are robust because their antecedents are supported by at least one car having no missing values on attributes present in the corresponding antecedent.
References Barbagallo, S., Consoli, S., Pappalardo, N., Greco, S., & Zimbone, S. (2006). Discovering reservoir operating rules by a rough set approach. Water Resources Management, 20, 19–36. Becerra-Fernandez, I., Zanakis, S., & Walczak, S. (2002). Knowledge discovery techniques for predicting country investment risk. Computers & Industrial Engineering, 43, 787–800. Doumpos, M., Zanakis, S., & Zopounidis, C. (2001). Multicriteria preference disaggregation for classification problems with an application to global investing risk. Decision Science, 32, 1–52. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.
Ann Oper Res (2011) 185: 105–138
137
Flinkman, M., Michalowski, W., Nilsson, S., Slowinski, R., Susmaga, R., & Wilk, Sz. (2000). Use of rough sets analysis to classify Siberian forest ecosystem according to net primary production of phytomass. INFOR, 38, 145–161. Greco, S., Matarazzo, B., & Slowinski, R. (1998). A new rough set approach to evaluation of bankruptcy risk. In C. Zopounidis (Ed.), Operational tools in the management of financial risks (pp. 121–136). Dordrecht: Kluwer Academic Publishers. Greco, S., Matarazzo, B., & Slowinski, R. (1999a). Rough approximation of a preference relation by dominance relations. European Journal of Operational Research, 117, 63–83. Greco, S., Matarazzo, B., & Slowinski, R. (1999b). The use of rough sets and fuzzy sets in MCDM. In T. Gal, T. Stewart, & T. Hanne (Eds.), Advances in multiple-criteria decision making (pp. 14.1–14.59). Boston: Kluwer Academic Publishers. Chap. 14. Greco, S., Matarazzo, B., & Slowinski, R. (2000a). Extension of the rough set approach to multicriteria decision support. INFOR, 38, 161–196. Greco, S., Matarazzo, B., & Slowinski, R. (2000b). Dealing with missing data in rough set analysis of multiattribute and multi-criteria decision problems. In S. H. Zanakis, G. Doukidis, & C. Zopounidis (Eds.), Decision making: Recent developments and worldwide applications (pp. 295–316). Dordrecht: Kluwer Academic Publishers. Greco, S., Matarazzo, B., & Slowinski, R. (2001). Rough sets theory for multicriteria decision analysis. European Journal of Operational Research, 129, 1–47. Greco, S., Matarazzo, B., & Slowinski, R. (2002). Rough approximation by dominance relations. International Journal of Intelligent Systems, 17(2), 153–171. Greco, S., Matarazzo, B., & Slowinski, R. (2004). Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and rough-set decision rules. European Journal of Operational Research, 158, 271–292. Greco, S., Matarazzo, B., & Slowinski, R. (2005). Decision rule approach. In J. Figueira, S. Greco, & M. Ehrgott (Eds.), Multiple criteria decision analysis: state of the art surveys (pp. 507–562). New York: Springer. Chap. 13. Greco, S., Matarazzo, B., & Slowinski, R. (2007). Customer satisfaction analysis based on rough set approach. Zeitschrift für Betriebswirtschaft, 3, 325–339. Grzymala-Busse, J. W. (1992). LERS—a system for learning from examples based on rough sets. In R. Slowinski (Ed.), Intelligent decision support. Handbook of applications and advances of the rough sets theory (pp. 3–18). Dordrecht: Kluwer Academic Publishers. Michalowski, W., Rubin, S., Slowinski, R., & Wilk, Sz. (2003). Mobile clinical support system for pediatric emergencies. Journal of Decision Support Systems, 36, 161–176. Pawlak, Z. (1982). Rough sets. International Journal of Information & Computer Sciences, 11, 341–356. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer. Pawlak, Z., Grzymala-Busse, J. W., Slowinski, R., & Ziarko, W. (1995). Rough sets. Communications of the ACM, 38, 89–95. Rossi, L., Slowinski, R., & Susmaga, R. (1999). Rough set approach to evaluation of stormwater pollution. International Journal of Environment and Pollution, 12, 232–250. Saini, K. G., & Bates, P. S. (1984). A survey of the quantitative approaches to country risk analysis. Journal of Banking and Finance, 8, 341–356. Slowinski, R., & Zopounidis, C. (1995). Application of the rough set approach to evaluation of bankruptcy risk. International Journal of Intelligent Systems in Accounting, Finance and Management, 4(1), 127– 141. Slowinski, R., Greco, S., & Matarazzo, B. (2002). Axiomatization of utility, outranking and decision-rule preference models for multiple-criteria classification problems under partial inconsistency with the dominance principle. Control and Cybernetics, 31, 1005–1035. Slowinski, R., Greco, S., & Matarazzo, B. (2005). Rough set based decision support. In E. K. Burke & G. Kendall (Eds.), Search methodologies: introductory tutorials in optimization and decision support techniques (pp. 475–527). New York: Springer. Chap. 16. Stefanowski, J., & Vanderpooten, D. (2001). Induction of decision rules in classification and discoveryoriented perspectives. International Journal of Intelligent Systems, 16(1), 13–28. Tseng, T. L., & Huang, C. C. (2007). Rough set-based approach to feature selection in customer relationship management. Omega, 35(4), 365–383. Tsumoto, S. (1998). Automated extraction of medical expert system rules from clinical databases based on rough set theory. Information Sciences, 112, 67–84. Wilk, S., Slowinski, R., Michalowski, W., & Greco, S. (2005). Supporting triage of children with abdominal pain in the emergency room. European Journal of Operational Research, 160, 696–709.
138
Ann Oper Res (2011) 185: 105–138
Zanakis, S., & Walter, G. (1994). Discriminant characteristics of U.S. banks acquired with or without federal assistance. European Journal of Operational Research, 77, 440–465. Zanakis, S. H., Austin, L., Nowading, D., & Silver, E. (1980). From teaching to implementing inventory management: problems of translation. Interfaces, 10(6), 103–110. Zopounidis, C., & Doumpos, M. (2002). Multi-group discrimination using multi-criteria analysis: Illustrations from the field of finance. European Journal of Operational Research, 139(2), 371–389.