A Comparative Study of Attribute Weighting Heuristics for Effort Estimation by Analogy Jingzhou Li
Guenther Ruhe
Software Engineering Decision Support Laboratory University of Calgary 2500 University Dr. NW, Calgary AB, Canada, T2N1N4
Software Engineering Decision Support Laboratory University of Calgary 2500 University Dr. NW, Calgary AB, Canada, T2N1N4
[email protected] [email protected] such an EBA method and it predicts effort for objects at requirement, feature, and project levels using data sets that can also have non-quantitative and missing values. AQUA differs from other methods by its flexibility in handling missing values, nonquantitative attributes, and similarity measures. It makes suggestions for choosing the set of closest analogies for analogy adaptation through a learning process in order to derive the effort estimates.
ABSTRACT Five heuristics for attribute weighting in analogy-based effort estimation are evaluated in this paper. The baseline heuristic involves using all attributes with equal weights. We propose four additional heuristics that use rough set analysis for attribute weighting. These five heuristics are evaluated over five data sets related to software projects. Three of the data sets are publicly available, hence allowing comparison with other methods. The results indicate that three of the rough set analysis based heuristics perform better than the equal weights heuristic. This evaluation is based on an integrated measure of accuracy.
The general assumption for EBA methods is that the attributes in the historical data set used for effort estimation have a correlation to the effort involved in implementing the objects. However, each attribute may have a different degree of relevance to the effort involved in implementing an object. Studies have shown that if this degree of relevance is considered by the EBA methods, the performance of the methods could be improved. There are two ways to reflect this degree of relevance of each attribute to the effort: attribute selection and attribute weighting.
Categories and Subject Descriptors D.2.9 [Software Engineering]: Management – cost estimation.
General Terms
Attribute selection, also known as feature selection, has been discussed in the areas of EBA as well as in machine learning. Kadoda et al. proved in [9] that a carefully selected subset of the attributes could possibly enhance the performance of an EBA method. Kirsopp et al. discussed feature selection for EBA in [11] and how better performance was obtained using feature selection. In the area of machine learning, feature selection has been regarded as an effective way to improve the performance of machine learning algorithms. One of the well-known methods for feature selection is WRAPPER [12]. Chen et al. discussed the application of feature selection to COCOMO models in [2] using WRAPPER. Better performances of COCOMO over 15 data sets were obtained.
Management, Measurement, Experimentation.
Keywords Software effort estimation, Estimation by analogy, Attribute selection and weighting, Rough set analysis, Non-quantitative attributes, Missing values.
1. BACKGROUND Estimation by analogy (EBA) [18, 22] is a promising technique for software effort estimation. It predicts the effort for a new project based on a historical data set that is described by a set of attributes. In EBA, the historical data set is retrieved for projects similar to the project under estimation through various similarity measures. The effort for the project to be estimated is then predicted using effort information from the most similar projects (or closest analogies) in the historical data. Different EBA methods have proposed different strategies for defining similarity measures and making suggestions about the set of closest analogies. AQUA, as proposed in [15], is
In some distance based EBA methods [21, 16], the weighted Euclidean distance was employed to measure the similarity between two objects to reflect the relevance of each attribute to the effort; quantitative methods such as regression analysis can be used to determine the weight of each attribute for quantitative data sets, as done in [14]. We noticed that the above so-called feature selection methods were only applicable for quantitative data sets and did not consider attribute weighting. In AQUA, all attributes with equal weights were used, in order to explore the main idea of AQUA and to see the prediction accuracy without considering the impact of attribute weighting.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISESE'06, September 21-22, 2006, Rio de Janeiro, Brazil. Copyright 2006 ACM 1-59593-218-6/06/0009...$5.00.
Rough set analysis (RSA) [20] is a proven technique to classify ordinal data. In [16], we proposed an attribute selection and weighting method using RSA through the support of RSA tools, such as ROSE2 [7]. AQUA is extended to AQUA+ by implementing the RSA based attribute selection and weighting
66
method as part of a pre-analysis phase. After the pre-analysis phase, a learning phase and the actual estimation phase are performed. Learning here refers to a systematic analysis for varying parameters. This is done to make suggestions on which and how many analogies are most favorable for a given object under estimation, in order to calculate the effort estimate for the object.
For all sg∈S, the effort of sg, Effort(sg), is to be estimated based on the values of Effort from a set of most similar objects ri∈R to sg. The set of most similar objects are also called analogies of sg, which are retrieved from R using certain similarity measures. Estimation using effort information from the analogies is also known as analogy adaptation.
Feature selection using rough sets was discussed in [25, 26], where heuristics were provided to generate the optimal subset of attributes according to certain fitness functions. However, there are no tools to support these types of heuristics. Issues such as weighting of attributes and dealing with missing values were not discussed in either of these methods.
In AQUA, the weights of attributes were involved in calculating the global similarity between two objects. For that, weighted local similarity with equal weights for all attributes was used. AQUA is extended to AQUA+ by incorporating attribute weighting methods as a pre-analysis phase, where the weights of attributes are determined either using different heuristics based on rough set analysis, or using all attributes with equal weights. Therefore, we have three phases in AQUA+:
Four heuristics for attribute weighting using RSA were proposed in [16] based on core attributes, reducts, and decision rules that are generated by RSA by applying RSA to historical data sets for effort esitmation. Even though only the reducts based heuristic was evaluated in AQUA+, better prediction accuracy was obtained over five data sets using the weights from the reducts-based heuristic than using equal weights in AQUA.
• Phase0: Attribute weighting • Phase1: Learning over the historical data set, and
In this paper, the four heuristics for attribute weighting using RSA, plus the equal weights option, are evaluated using AQUA+ over five data sets. The following questions are investigated in this paper:
• Phase2: Predicting. During the Learning phase, Leave-One-Out-Cross-Validation (or Jack-knife validation) [6] based learning is performed to determine the prediction accuracy distribution of data set DB, denoted by AccuDistr(DB). This is done by varying the threshold for the number of analogies (N) and the threshold for similarity (T) of the analogies taken for adaptation. The results of AccuDistr(DB) are then used in the Predicting phase. Sample segments of AccuDistr(DB) of five data sets will be given in section 4.4. For details about the learning and predicting processes, please refer to [15, 16].
(1) Which of the four RSA based heuristics can provide better performance than the equal weights option as applied in AQUA? (2) Which heuristic is better than another, and for what types of data sets? (3) How do the results of AQUA+ compare to the results of other EBA methods? (4) What knowledge is needed, and effort is required to use the proposed attribute weighting method?
3. FIVE HEURISTICS FOR ATTRIBUTE WEIGHTING
A brief introduction into AQUA and AQUA+ is given in section 2. Key concepts of rough sets relevant to this paper and the four RSAbased heuristics of attribute weighting are presented in section 3. In the comparative study in section 4, the five data sets and the evaluation criteria of prediction accuracy are studied first. Then, the results of AQUA+ from applying the five heuristics to each data set are presented. Section 5 is devoted to further analysis and discussion of the results. Conclusions and future work is given in Section 6 to finalize the paper.
Assume that m is the number of condition attributes in A of data set DB. For all i=1, …, m, let wi, represent the weight of the attribute ai∈A. The weights determined using the five weighting heuristics are denoted by wi[0], wi[1], wi[2], wi[3], and wi[4] respectively. In what follows, we describe the five heuristics in detail.
3.1 All attributes with equal weights (H0) All attributes with equal weights
2. INTRODUCTION TO AQUA AND AQUA+
wi[0]= 1/m
In AQUA, the historical data set DB is defined as a triple:
(1)
This option is called the EqualW heuristic and is used in AQUA as a default option. Therefore, it is used as a baseline heuristics for this comparative study.
DB = . Therein, R is the set of objects R = {r1, r2, …, rn}, and P = A U {Effort}; A = {a1, a2, …, am} is the set of attributes to describe the objects; Effort is a specific attribute characterizing the effort for implementing the respective object. Effort(ri) represents the effort to develop object ri. V= {aj(rk)} is the domain of attribute values of all objects in R, and aj(rk) represents the value of attribute aj∈P for object rk∈R. The set S = {s1, s2, …, st} denotes the given objects to be estimated and S shares the same attributes A with R.
3.2 Weighting heuristics based on rough sets Rough set analysis uses the notion of equivalence classes to construct approximations of a given set. The lower approximation of the set is the union of all equivalence classes that are subsets of the set. The upper approximation, on the other hand, is the union of all the equivalence classes that have a nonempty intersection with the set. Rough set analysis can also be used for classification. Generally, a decision table (or information system) is defined as a 3-tuple in rough sets theory:
The problem of effort estimation by analogy can be stated as follows:
67
DT = ,
m
wi[1]= N i
where U and E are finite and non-empty sets; U is the universe of objects and E the set of attributes or features. Each attribute e∈E is associated with a set Ve ⊆VE of its values. E may be partitioned into two subsets: condition attributes C and decision attributes D, C∩D=Ø. The concepts that are relevant to this paper are briefly described below. For details about rough sets theory, please refer to [20]; for applications of rough sets in software engineering, see [13, 21].
∑ Ni
(2)
i =1
This heuristic is called Reducts heuristic. (H2) Core attributes based heuristic The set of core attributes Core(DB) with equal weights are used if a set of core attributes can be generated for DB. Let mc be the number of core attributes in Core(DB),
Reducts—Discovering dependencies between attributes enables the reduction of the set of attributes. The least minimal subset B⊆C which ensures the same quality of classification as the set of all attributes C is a reduct of DT regarding decision attributes C. A decision table DT may have more than one reducts. All the reducts of DT regarding the given set of condition attributes C is denoted by Reduct(DT).
wi[2]=
{
1 / mc 0
ai ∈ Core ( DB )
ai ∉ Core ( DB )
(3)
This heuristic is called the Core heuristic. (H3) Decision rule based heuristic using the number of occurrences Similarly to the above ideas, the more frequently a condition attribute is used in the decision rules Rules(DB), the more significance it has in terms of reflecting the dependency of the decision attributes on this condition attribute. Therefore, the number of occurrences of attribute ai in Rules(DB), denoted by Li, can be used to calculate the weights of each attribute:
Core attributes—The intersection of all the reducts is called the core attributes of DT, denoted by Core(DT). The core is the most important subset of attributes in the decision table, which cannot be eliminated without decreasing the quality of approximation.
m
Decision rules—Knowledge representation in the form of production rules describes which conclusions can be drawn from which assumptions. This set of decision rules is actually a representation of knowledge of the information system for objects' classification regarding the most significant condition attributes. The production rules generated from the lower and upper approximations of a decision table are called decision rules in RSA. The set of decision rules for DT is represented by Rules(DT).
wi[3]= Li
∑ Li
(4)
i =1
In [21], Li was used to rank the importance of attributes or cost drivers of COCOMO. This heuristic is called the Rules# heuristic. (H4) Decision rule based heuristic using relative strength Extending heuristic H3, we consider for each decision rule rj∈Rules(DB), j=1, …, k, its relative strength, denoted by hj, which measures the quality of the rule. hj is the ratio of the number of supporting objects to the total number of objects in data set DB. Let the number of occurrences of ai in rule rj be qij, the total number of occurrence of ai in all rules, denoted by Qi, is defined as
The concept of the decision table DT in rough sets exactly matches the definition of the historical data set DB in AQUA as discussed in section 2: The set of attributes A in DB corresponds to the set of condition attributes C in DT. In addition, Effort in DB is the only decision attribute of D. The domain of the attribute values in DB and DT are V and VE, respectively. In this paper, DB will be used frequently in place of DT when referring to concepts of RSA.
m
Qi=
From the above concepts of rough sets, we know that the reducts, core attributes, and decision rules reflect the dependency of the decision attribute (Effort) on the condition attributes (A) for a decision table (data set DB). We can say that the more frequently a condition attribute appears in a core, reduct, or decision rule, the more dependent the decision attribute is on it. Therefore, we proposed in [16] four heuristics for attribute weighting based on the core attributes, reducts and decision rules generated from a given data set DB for EBA, based on the following hypothesis:
∑ ( h j ∗ qij ) . j =1
Based on that, the weight of attribute ai is
calculated as: m
wi[4]= Qi
∑ Qi
(5)
i =1
This heuristic is called the RulesRS heuristic. All the five attributing heuristics can provide attribute weights in Phase0 for AQUA+; hence they can be applied individually. The problem is which heuristics will be chosen to use for a given data set. One option is that all the five heuristics are tested, and then choose the best one for a given data set. However, this research intends to study the general performance of the five heuristics over different types of data sets so that one or two heuristics with the best performance are recommended for later use.
The dependency of the decision attributes on the condition attributes determined by RSA for classification is also valid for EBA. The four heuristics are defined below. (H1) Reducts based heuristic
4. COMPARATIVE STUDY
According to the above discussion, the more frequent a condition attribute appears in the reducts Reduct(DB), the more the Effort is dependent on the attribute; and the heavier the weight should be assigned to the attribute. Therefore, based on the above hypothesis, the number of occurrences of an attribute ai in Reduct(DB), denoted by Ni, is used to calculate the weights of each attribute:
From the comparative study in [15] we know that when all attributes with equal weights were used, AQUA got better accuracy than some existing EBA methods over two out of three public data sets. Therefore, we will not compare AQUA+ with other EBA methods directly in this study. We just compare between the results of AQUA+ from applying different attribute weighting heuristics.
68
The RSA tool used in this study for attribute weighting is ROSE2.
PAC constitute the AccuDistr(DB).
4.1 Data sets
Definition 1. MRE(rk)—Magnitude of Relative Error [4]
Five data sets were used for this comparative study: the university student projects—USP05-FT and USP05-RQ, ISBSG04-2, Mends03, and Kem87. Table 1 shows the summary of these data sets, where
MRE(rk)=
² Effort ( rk ) − Effort ( rk )
(6)
Effort ( rk )
for a given object rk∈R under estimation, where Effort(rk) is the
• "#Objects" represents the number of objects in the data set;
² ( rk ) is the predicted effort of object rk. □ actual effort and Effort
• "#Attributes" represents the number of condition attributes;
Definition 2. MMRE(N, T)—Mean Magnitude of Relative Error [4]
• "%Missing values" represents the percentage of missing values; MMRE(N, T) =
• "%Non-Quantitative attributes" represents the percentage of non-quantitative attributes.
USP05-RQ
121
14
2.54
71
Jingzhou et al., 2005 [15]
USP05-FT
76
14
6.8
71
Jingzhou et al., 2005 [15]
ISBSG04-2
158
24
27.24
63
ISBSG, 2004 [8]
Pred(α, N, T)= τ λ
Kem87
15
5
0
40
Mends03
34
6
0
0
Mends et al., 2003 [17]
α=0.25 is normally used for actual evaluation in literature and is used in this paper; and Pred(0.25) is used when N and T are given or not considered at all. Actually, Pred(0.25) gives the number of estimates whose absolute error is less than or equal to 25% of the value of actual effort in a single run of Jack-knife cross validation. Definition 4. Strength(N, T)
Support(N, T) is the number of objects in R that can be estimated with a given pair of values of (N, T). Support(N, T) is actually the number of objects that are used in the calculation of the corresponding MMRE. Strength(N, T) is then defined as the ratio of Support to the total number of objects in R. □
4.2 Evaluation criteria Different criteria for evaluating quality of estimation have been introduced in literature. Among the many existing evaluation criteria, we have applied the most frequently-used ones. In addition, we have defined some criteria to measure the results of multiple cross-validations in the Learning phase. For a given data set DB as defined in section 2, the cross-validations are applied to R and produce the AccuDistr(DB). We therefore give the following definitions of evaluation criteria.
Definition 5. MPS(N, T)
Definition 0. PAC(N, T)—Point-wise Accuracy
□
*
(8)
where λ is the total number of objects that are estimated in a single run of Jack-knife cross-validation with a given pair of values of (N, T), and τ is the number of objects with MRE less than or equal to α. □
Source
Kemerer et al., 1987 [10]
(7)
Definition 3. Pred(α, N, T)—prediction at level α [4]
Table 1. Summary of the data sets for the comparative study %Non#Attri %Missing Quantitative butes values attributes
k
for a given pair of values of (N, T) for all the n objects in R in a single run of Jack-knife cross-validation. □
In this study, a subset of ISBSG04, called ISBSG04-2, which contains objects whose efforts are in the range of 10,000 and 20,000, is used for easy computation.
#Objects
∑ MRE ( r )
k ∈R
The decision attribute is always Effort in all data sets. Although there are two qualitative attributes in Kem87, they were not used by previous EBA methods.
Name
1 nr
MPS is an aggregated criterion which considers the acceptable thresholds of MMRE (β1), Pred(0.25) (β2), and Strength (β3) for a PAC(N, T):
⎧"Yes " MMRE ≤ β1 and Pred (25) ≥ β 2 and Strength ≥ β 3 MPS = ⎨ ⎩" No " Otherwise
*
(9)
The acceptable thresholds of MMRE and Pred(0.25) were proposed in [4]; threshold of Strength was proposed in [15]. They are: β1=0.25, β2=0.75, β3=0.3. These thresholds can be set to different values in different contexts of effort estimation.
For a given pair of values (N , T ), the corresponding prediction accuracy of AQUA+ Accuracy* can be obtained from a single run of Jack-knife cross-validation. (N*, T*, Accuracy*) is thus called a Point-wise Accuracy (PAC) in the three dimensional space (N, T, Accuracy). □
MPS tells us whether the accuracy of a PAC is acceptable according to given thresholds. However, it is still difficult to compare the accuracy of multiple PAC to determine which PAC is better than another. Therefore, we define a single criterion to measure the accuracy of a PAC, called MPSW.
The Accuracy of a PAC is a vector of criteria: MMRE, Pred, Strength, MPS, and MPSW, which will be defined below. Likewise, a series of PAC are obtained by varying N and T in the three dimensional space (N, T, Accuracy) for a given data set DB after multiple runs of Jack-knife cross-validation. This series of
Definition 6. MPSW(N, T)
69
MPSW(N,T)= η1(1–MMRE(N,T)) +
are in [0, 1]. (10)
For a given data set DB, let M[k, i] represent the value of the kth criteria about the ith weighting heuristic, where k=1, 2, 3 is the index of the three criteria: normalized MPSY, MPSV, and (1−MMRE-FS); i=0, …, 4 is the index of weighting heuristics H0, H1, H2, H3, and H4 respectively. We define the aggregated accuracy of the ith heuristic for a given data set:
η2Pred(0.25, N, T) + η3Strength(N, T) which is the weighted average of MMRE, Pred(0.25) and Strength of a PAC for a given pair of values of (N, T). □ Typically, MMRE is the most important and frequently-used criterion for measuring the prediction accuracy. Therefore, we give a stronger weight η1 = 0.4 to MMRE, with η2 = 0.3 for Pred(0.25) and η3 = 0.3 for Strength.
Definition 9. AccuH[i] 3
AccuH[i]=
To compare the overall prediction accuracy of AccuDistr(DB) across multiple data sets, the average of the MPSW of all the involved PAC in AccuDistr(DB), called MPSV, is used. 1
Here, we use the pair-wise differences between the values of a criterion about each heuristic to define the aggregated accuracy of a heuristic over a given data set. This aggregated accuracy for a heuristic reflects the fact that the greater the values of the three criteria are, the better the overall accuracy of a heuristic will be. Thus the greater the value of AccuH[i] is, the better the heuristic is for AQUA+ to produce better overall performance than other heuristics do for a given data set. The best-fit heuristic for a given data set is thus determined by
p
MPSWi ( N , T ) p∑
(12)
k =1 j = 0
Definition 7. MPSV(DB)
MPSV(DB) =
4
∑ ∑ ( M [ k , i ] − M [ k , j ])
(11)
i =1
where MPSWi(N, T) is the MPSW of the ith PAC and p is the number of PAC in AccuDistr(DB). □ The greater the MPSW and MPSV, the better the prediction accuracy of a PAC and the AccuDistr(DB).
max ( AccuH [i ])
(13)
i = 0..4
Definition 8. MPSY(DB)
MPSY(DB) is defined as the number of PAC(N, T) with MPS(N, T) = "Yes" for given changing ranges and steps of N and T in AccuDistr(DB). □
4.4 Evaluation over five data sets This section will present the results of AQUA+ applying the five heuristics over a given data set so that the best-fit heuristic for the data set can be determined.
MPSY is also an indicator of the performance of EBA methods. The greater the number is, the more applicable an EBA method to a data set will be.
4.4.1 USP05-RQ
Among these criteria, MPSV and MPSY measure the overall performance of an EBA method through AccuDistr(DB), while MMRE, Pred, Strength, MPS, and MPSW constitute the Accuracy vector and measure individual PAC. In fact, MPSW is a composite criterion combining MMRE, Pred, and Strength. The reason we still keep MMRE, Pred, and Strength is to facilitate possible future comparison by other methods that support only MMRE and Pred.
From RSA, we got three core attributes for this data set. The first two PAC, and the first PAC at full strength of AccuDistr(USP05RQ) from the five heuristics are listed in Table 2 to Table 6. Table 7 summarizes the three criteria MPSY, MPSV, and (1−MMRE-FS), and the aggregated accuracy of each heuristic. Table 2. USP05-RQ with weights from EqualW using wi[0]
We assume that all the PAC(N,T) in AccuDistr(DB) are ordered by MPS, MPSW, (1−MMRE) in descending for the Learning (Phase1) and Predicting (Phase2) phases, as discussed in section 2.
T
0.92 0.86 0
To illustrate the results of AQUA+ from applying the five attribute weighting heuristics over the five data sets, we will only list the vector Accuracy of the first two PAC, and the first PAC at full Strength, together with MPSV and MPSY about AccuDistr(DB), for each heuristic over each data set in the following evaluation. Because of limitation in size, we are unable to list all the PAC of the whole AccuDistr(DB) in the paper.
N MMRE Pred(0.25) Strength MPSW
2 1 1
0.53 0.62 0.78
0.51 0.47 0.46
0.69 0.83 1.00
0.55 0.54 0.53
MPSV
= 0.19
MPSY =0
Table 3. USP05-RQ with weights from Reducts using wi[1] T 0.92 0.90 0
4.3 Evaluation process In order to compare the results of different heuristics over different data sets, the two criteria for overall measurement, MPSV and MPSY, are used along with the MMRE of the first PAC at full Strength, denoted by MMRE-FS. The reason to use MMRE-FS is that it is the prediction accuracy former EBA methods normally provided; hence it can be regarded as a baseline.
N MMRE Pred(0.25) Strength MPSW 2 0.54 0.49 0.76 0.56 MPSV MPSY 6 0.45 0.50 0.63 0.56 = 0.16 = 0 2 0.46 1.00 0.55 0.71
Table 4. USP05-RQ with weights from Core using wi[2] T 0 0.76 0.02
In this evaluation process, all MPSY from the five heuristics over a given data set are normalized so that the values of all three criteria
70
N MMRE Pred(0.25) Strength MPSW 1 1.08 0.31 1.00 0.36 MPSV MPSY 1 1.11 0.32 0.98 0.35 = 0.14 = 0 1 1.1 0.29 0.99 0.35
Reducts (AccuH[1]) is the best-fit heuristic for this data set, as shown by the marked bold number in Table 9.
Table 5. USP05-RQ with weights from Rules# using wi[3] T
0.90 0.92 0
N MMRE Pred(0.25) Strength MPSW
1 1 2
0.49 0.49 0.69
0.49 0.50 0.50
0.81 0.79 1.00
0.59 0.59 0.57
MPSV
= 0.16
MPSY =0
4.4.4 Mends03 There are no core attributes for this data set. Table 10 summarizes the aggregated accuracy of each heuristic. The first two PAC, and the first PAC at full strength of the four heuristics are presented in Appendix A from Table 10a to Table 10d.
Table 6. USP05-RQ with weights from RulesRS using wi[4] T 0.92 0.24 0.12
N MMRE Pred(0.25) Strength MPSW 1 0.48 0.51 0.77 0.90 MPSV MPSY 1 0.64 0.48 0.99 0.90 = 0.11 = 0 1 0.67 0.47 1.00 0.57
Table 10. Comparison of weighting heuristics over Mends03 H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS)
MPSY (Normalized) MPSV 1–MMRE-FS AccuH[i]
Table 7. Comparison of weighting heuristics over USP05-RQ H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS)
MPSY (Normalized) MPSV 1–MMRE-FS AccuH[i]
0
0
0
0
0
0.19 0.22 0.22
0.16 0.29 0.42
0.14 -0.08 -1.53
0.16 0.31 0.52
0.11 0.33 0.37
MPSY (Normalized) MPSV 1–MMRE-FS AccuH[i]
0.36 0.46 -0.79
0.34 0.56 0.03
-
0.35 0.59 0.62
0.34 0.59 0.15
0.18
0.72 0.74 0.16
0.73 0.73 1.81
0.76 0.29 -2.62
0.74 0.74 0.30
0.73 0.74 0.35
0.73 0.43 -0.47
H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS) USP05-RQ 0.22 0.42 -1.53 0.37 0.52 USP05-FT -0.79 0.03 0.15 0.62 ISBSG04-2 0.16 -2.62 0.30 0.35 1.81 Mends03 -0.09 -0.05 -0.05 0.15 Kem87 -0.48 1.42 -0.47 -0.47 1.42
From Table 12, we obtain the following observations: (1) Values of AccuH[i] in column H1, H3, and H4 are greater than the values in column H0. This means that these three weighting heuristics produced better performance than the EqualW heuristic H0, even though AccuH[3] and AccuH[4] are very close to AccuH[0] over Mends03 and Kem87. Therefore, we can say that AQUA+, with the support of the proposed attribute weighting method using RSA in Phase0, can produce better performance than using all attributes with equal weights.
H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS)
0.16
0.05
0.74 0.42 -0.47
(AccuH[i])
Table 9. Comparison of weighting heuristics over ISBSG04-2
0.01
0.05
0.73 0.51 1.42
Table 12. Results of weighting heuristics over five data sets
There are two core attributes for this data set. Table 9 summarizes the aggregated accuracy of each heuristic. The first two PAC, and the first PAC at full strength of the four heuristics are listed in Appendix A from Table 9a to Table 9e.
0.48
0.44
0.73 0.51 1.42
The overall accuracy of all the heuristics, in terms of AccuH[i], over all the five data sets is summarized in Table 12.
4.4.3 ISBSG04-2
0.15
0.44
0.72 0.47 -0.48
5. DISCUSSION 5.1 Observations from the comparative study
According to equation (13), we know from Table 8 that Rules# (AccuH[3]) is the best-fit heuristic for this data set as well, shown as the marked bold number in Table 8.
MPSY (Normalized) MPSV 1–MMRE-FS AccuH[i]
0.02
Because the reducts and the core contain the same attributes, we obtained the same results for both Reducts and Core. Reducts (AccuH[1]) and the Core (AccuH[2]) are the best-fit heuristics for this data set, as shown by the marked bold numbers in Table 11.
H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS)
0.25
0.84 0.85 -0.05
H0 H1 H2 H3 H4 (EqualW) (Reducts) (Core) (Rules#) (RulesRS)
Table 8. Comparison of weighting heuristics over USP05-FT
0.36
0.27
0.84 0.85 -0.05
Table 11. Comparison of weighting heuristics over Kem87
There are no core attributes for this data set. Table 8 summarizes the three criteria and the aggregated accuracy AccuH[i] of each heuristic. The first two PAC and the first PAC at full strength of the four heuristics are listed in Appendix A from Table 8a to Table 8d.
-
0.27
-
There are two core attributes for this data set. Table 11 summarizes the aggregated accuracy of each heuristic. The first two PAC, and the first PAC at full strength of the four heuristics are presented in Appendix A from Table 11a to Table 11d.
4.4.2 USP05-FT
0.25
-
0.87 0.89 0.15
4.4.5 Kem87
For brevity reasons, in the following subsections we will present only the summary of the three criteria and the aggregated accuracy AccuH[i]. Other tables about samples of AccuDistr(DB) are given in Appendix A.
0.13
0.25
0.86 0.88 -0.09
Again, Reducts (AccuH[1]) is the best-fit heuristic for this data set, as shown by the marked bold number in Table 10.
According to equation (13), we know from Table 7 that Rules# (AccuH[3]) is the best-fit heuristic for this data set, as shown by the marked bold number.
MPSY (Normalized) MPSV 1–MMRE-FS AccuH[i]
0.21
71
(2) Considering AccuH[i] of every heuristic over all the five data sets, we rank the five heuristics in the following order: H1
f
H3
f
H4
6. CONCLUSIONS AND FUTURE WORK In this comparative study, we have evaluated five attribute weighting heuristics over five data sets. Based on the observations of the results, we reached the following conclusions:
f H0 f H2,
where "A f B" means that heuristic A is better than B in terms of AccuH[i] for the given data set.
(1) With the support of the proposed attribute weighting method using RSA, better performance of AQUA+ can be obtained, especially when using heuristics Reducts (H1) and Rule# (H3), compared to EqualW (H0).
(3) The best AccuH[i] over the five data sets are produced by heuristics H1 and H3, with three from H1 and two from H3. H1 seems the best attribute weighting heuristic for AQUA+. Therefore, H1 and H3 are suggested for AQUA+ to use in Phase0.
(2) Heuristics Reducts (H1) and Rule# (H3) produced the best accuracy in terms of AccuH[i] over all the data sets. Therefore, these two heuristics are recommended for use in AQUA+ with H1 being the first option. Heuristic Core (H2) is not recommended, since it is not always applicable. In addition, Reducts (H1) works well for data sets with more missing values, while Rule# (H3) works well for data sets with more nonquantitative attributes.
(4) The core based heuristic H2 is not always applicable, because only three data sets produced core attribute. H2 produced the best AccuH[2] on one data set, but the worst on another two data sets. Therefore, H2 is not recommended. (5) H1 works especially well for data sets with more missing values, since H1 achieved the best AccuH[i] over ISBSG04-2, which has over 27% missing values compared to less than 10% or no missing values in other data sets.
(3) Because of transitivity, we conclude that AQUA+ using Reducts (H1) and Rule# (H3) performs better than former six EBA methods over two data sets: Mends03 and Kem87.
(6) The rule based heuristic H3 works especially well for data sets with more non-quantitative attributes, since both the data sets on which H2 worked the best – USP05-F and USP05-RQ – have 71% non-quantitative attributes.
(4) There is an overhead for applying the weighting methods using AQUA+, especially the discretization of a data set and the determination of the best-fit heuristic, compared to the option of all attributes with equal weights. However, it seems worthwhile to invest the effort to achieve much better performance, which can be seen from the marked bold values in Table 12.
5.2 Prerequisites for applying the weighting method
Although the proposed weighting heuristics were only evaluated using AQUA+, they are actually applicable to any EBA methods that use attributes weights. The results of this comparative study are useful for other EBA methods as well, but the heuristics should be re-evaluated for those EBA methods that use distance based similarity measure, e.g. Euclidian distance, other than local-global similarity measure, in order to determine which one is better than another is, since the attribute weights are used in different ways in these two situations.
The knowledge that is needed for a practitioner to apply the proposed attribute weighting method through AQUA+ includes fundamentals about EBA, discretization methods of continuous attributes, and basic concepts of rough sets. Knowledge about the usage of RSA tools, such as ROSE2, is required as well. The effort to apply the proposed attribute weighting methods through AQUA+ is concentrated on two manual tasks: (i) determining the types of attributes in accordance with the types defined in AQUA+ in the initial preparation of the data set for AQUA+; (ii) the dealing with missing values [1, 19, 24] and discretization [3, 5] of continuous attributes in order to use RSA. Other tasks can be automated by tools.
As we evaluated the attribute weighting heuristics using AQUA+ over five data sets only, the above conclusions are tentative. Another limitation of the study is the lack of profound study of the statistical significance of the differences of AccuH[i], as well as other accuracy criteria, for comparing different heuristics.
5.3 Relation to the estimation results of other EBA methods
In our future work, we will evaluate the attribute weighting heuristics using more data. Further empirical studies are also planned to investigate the statistical significance of the differences between the accuracy criteria in order to make the results of this kind of comparative studies more convincing.
As studied in [15], AQUA using H0 performed better than six other EBA methods over two out of three data sets: Mends03 and Kem87, as shown in Table 13. From the current study, we know that AQUA+ using H1 and H3 outperformed AQUA using H0 over these two data sets as well. Because of transitivity, we can conclude that AQUA+ using H1 and H3 performs even better than the six EBA methods over the two data sets.
7. ACKNOWLEDGEMENT The authors would like to thank the Alberta Informatics Circle of Research Excellence (iCORE) for its financial support of this research. Thanks are also given to Jim McElroy for his comments to improve the readability of the paper.
Table 13. Former EBA methods and data sets Methods Data sets Estor [18] Kem87 ANGEL [22] GRACE [23] CBR-MX-Mean [17] Mends03 CBR-WE-CA [17] CBR-UE-CA [17]
8. REFERENCES [1] Cartwright, M., Shepperd, M., and Song, Q. Dealing with Missing Software Project Data, Proc. of the 9th International Symposium on Software Metrics, September 2003, 154-165. [2] Chen, Z., Boehm, B., Menzies, T., and Port, D. Finding the
72
Right Data for Software Cost Modeling, IEEE Software, 22(2005) 38-46.
Engineering, (2006) DOI: 10.1007/s10664-006-7552-4. [16] Li, J., and Ruhe, G. Attribute Selection and Weighting Using Rough Sets for Effort Estimation by Analogy—Initial Results, Technical Report SEDS-TR-05102, Software Engineering Decision Support Laboratory at the University of Calgary, Canada, October 2005.
[3] Chmielewski, M. R., and Grzymala-Busse, J. W. Global Discretization of Continuous Attributes As Preprocessing for Machine Learning, Third International Workshop on Rough Sets and Soft Computing, 1994, 294-301. [4] Conte, S. D., Dunsmore, H., and Shen, V.Y. Software engineering metrics and models, Benjamin-Cummings Publishing Co. Inc., 1986.
[17] Mendes, E., Watson, I., Chris T., Nile M., and Steve, C. A Comparative Study of Cost Estimation Models for Web Hypermedia Applications, Empirical Software Engineering, 8(2003) 163-196.
[5] Dougherty, J., Kohavi, R., and Sahami, M. Supervised and Unsupervised Discretization of Continuous Features, Proc. of International Conference on Machine Learning, 1995, 194-202.
[18] Mukhopadhyay, T., Vicinanza, S., and Prietula, M. J. Examining the Feasibility of a Case-based Reasoning Model for Software Effort Estimation. MIS Quarterly, 16, 2 (1992) 155-171.
[6] Efron, B., and Gong, G. A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation, The American Statistician. 37(1983): 36-48.
[19] Myrtveit, I., Stensrud, E., and Olsson, U. H. Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods, IEEE Transactions on Software Engineering, 27, 11(2001) 999-1013.
[7] IDSS, ROSE2, Institute of Computing Science, Poznań University of Technology, http://idss.cs.put.poznan.pl/site/rose.html, January 2006.
[20] Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Acdemic Publishers, 1991.
[8] ISBSG, International Software Benchmark and Standards Group, Data R8, www.isbsg.org, January 2006.
[21] Ruhe, G. Rough sets Based Data Analysis in Goal Oriented Software Measurement, Proc. of the third International Symposium on Software Metrics, March 1996, 10-19.
[9] Kadoda, G., Michelle, C., Chen, L., and Shepperd, M. Experiences Using Case-Based Reasoning to Predict Software Project Effort, Proc. of EASE'2000 - Fourth International Conference on Empirical Assessment and Evaluation in Software Engineering, January 2000.
[22] Shepperd, M., and Schofield,C. Estimating Software Project Effort Using Analogies, IEEE Transactions on Software Engineering, 23(1997) 736-743.
[10] Kemerer, C. F. An Empirical Validation of Software Cost Estimation Models, Communication of the ACM, 30(1987) 436-445.
[23] Song, Q., Shepperd, M., and Mair, C. Using Grey Relational Analysis to Predict Software Effort with Small Data Sets, METRICS'05: Proceedings of the 11th IEEE International Software Metrics Symposium. Como, Italy, 2005, 35-45.
[11] Kirsopp, C., and Shepperd, M. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence, December 2002.
[24] Strike, K. et al. Software Cost Estimation with Incomplete Data, IEEE Transactions on Software Engineering, 27, 10 (October 2001) 890-908.
[12] Kohavi, R., and John, G. H. Wrappers for Feature Subset Selection, Artificial Intelligence 97(1997) 273–324.
[25] Zhang, M., and Yao, J. A Rough Sets Based Approach to Feature Selection, Proc. of the 23rd International Conference of NAFIPS, June 2004, 434-439.
[13] Laplante, P. A., and Neil, C. J. Modeling Uncertainty in Software Engineering Using Rough Sets, Innovations in Systems and Software Engineering, 1(2005) 71-78.
[26] Zhong, N., and Dong, J. Using Rough Sets with Heuristics for Feature Selection, Journal of Intelligent Information Systems 16(2001) 199-214.
[14] Leung, H. K. N. Estimating Maintenance Effort by Analogy, Empirical Software Engineering, 7(2002) 157-175. [15] Li, J., Ruhe, G., Al-Emran, A., and Richter, M. M. A Flexible Method for Effort Estimation by Analogy, Empirical Software
73
9. APPENDIX A Table 10a. Mends03 with weights from EqualW using wi[0]
Table 8a. USP05-FT with weights from EqualW using wi[0] T 0.86 0.80 0
T 0 0.02 0.04
N MMRE Pred(0.25) Strength MPSW 1 0.17 0.86 0.75 0.81 MPSV MPSY 1 0.20 0.82 0.82 0.81 =0.36 = 23 1 0.74 1.00 0.71 0.54
Table 10b. Mends03 with weights from Reducts using wi[2]
Table 8b. USP05-FT with weights from Reducts using wi[1] T
0.92 0.92 0
T N MMRE Pred(0.25) Strength MPSW 0 2 0.11 0.94 1.00 0.94 MPSV MPSY 0.02 2 0.11 0.94 1.00 0.94 =0.87 =1039 0.04 2 0.94 1.00 0.94 0.11
N MMRE Pred(0.25) Strength MPSW
17 19 1
0.09 0.09 0.44
0.89 0.89 0.70
0.36 0.36 1.00
0.74 0.74 0.73
MPSV MPSY =0.34 = 45
Table 10c. Mends03 with weights from Rules# using wi[3]
Table 8c. USP05-FT with weights from Rules# using wi[3] T
T
N MMRE Pred(0.25) Strength MPSW
0.92 20 0.92 21 0 1
0.14 0.15 0.41
0.84 0.84 0.70
0.41 0.41 1.00
0.72 0.71 0.75
MPSV MPSY =0.35 = 64
0 0.02 0.04
T
N MMRE Pred(0.25) Strength MPSW
0.92 20 0.92 21 0 1
0.15 0.17 0.41
0.81 0.81 0.70
0.42 0.42 1.00
0.71 0.70 0.75
MPSV MPSY
=0.34
= 45
0.50 12 0.52 12 0 9
0.2 0.21 0.26
0.75 0.76 0.68
0.46 0.42 1.00
0.68 0.67 0.80
=0.72
T 0.78 0.8 0
N MMRE Pred(0.25) Strength MPSW 68 0.23 0.75 0.54 0.70 MPSV MPSY 53 0.23 0.76 0.55 0.70 =0.76 = 3 1 0.33 0.48 1.00 0.71
0.4 0.4 0
T
N MMRE Pred(0.25) Strength MPSW 28 0.2 0.75 0.61 0.73 MPSV MPSY 29 0.2 0.75 0.61 0.73 =0.73 = 69 17 0.27 0.70 1.00 0.80
0.86 0.88 0
0.22 0.21 0.26
0.77 0.76 0.72
0.58 0.56 1.00
0.72 0.71 0.81
T MPSV MPSY =0.74 =23
0.86 0.88 0
Table 9e. ISBSG04-2 with weights from RulesRS using wi[4] T 0.48 0.48 0.06
MPSV MPSY =0.84 =1146
N MMRE Pred(0.25) Strength MPSW
0.15 0.15 0.15
0.88 0.88 0.88
1.00 1.00 1.00
0.90 0.90 0.90
MPSV MPSY
=0.84 =1146
N MMRE Pred(0.25) Strength MPSW 1 0.19 0.83 0.80 0.81 MPSV MPSY 1 0.19 0.83 0.80 0.81 =0.49 =18 1 0.67 1.00 0.70 0.49
N MMRE Pred(0.25) Strength MPSW
1 1 1
0.14 0.19 0.58
0.86 0.80 0.47
0.47 0.33 1.00
0.74 0.66 0.61
MPSV MPSY =0.4 =2
Table 11d. Kem87 with weights from RulesRS using wi[4]
N MMRE Pred(0.25) Strength MPSW
73 75 5
0.90 0.90 0.90
Table 11c. Kem87 with weights from Rules# using wi[3]
Table 9d. ISBSG04-2 with weights from Rules# using wi[3] T
1.00 1.00 1.00
Table 11b. Kem87 with weights from Reducts using wi[1]
Table 9c. ISBSG04-2 with weights from Reducts using wi[2] T 0.48 0.48 0
0.85 0.85 0.85
T N MMRE Pred(0.25) Strength MPSW 0.86 1 0.19 0.80 0.33 0.66 MPSV MPSY 0.5 12 0.09 1.00 0.27 0.74 =0.43 =1 0 1 0.53 1.00 0.65 0.54
MPSY =22
Table 9b. ISBSG04-2 with weights from Core using wi[1] T 0.88 0.9 0
0.15 0.15 0.15
Table 11a. Kem87 with weights from EqualW using wi[0]
N MMRE Pred(0.25) Strength MPSW MPSV
3 3 3
0 9 0.02 9 0.04 9
Table 9a. ISBSG04-2 with weights from EqualW using wi[0] T
N MMRE Pred(0.25) Strength MPSW
Table 10d. Mends03 with weights from RulesRS using wi[4]
Table 8d. USP05-FT with weights from RulesRS using wi[4] T
N MMRE Pred(0.25) Strength MPSW 2 0.12 0.94 1.00 0.93 MPSV MPSY 2 0.12 0.94 1.00 0.93 = 0.86 = 878 2 0.94 1.00 0.93 0.12
N MMRE Pred(0.25) Strength MPSW 30 0.18 0.76 0.44 0.69 MPSV MPSY 31 0.19 0.75 0.43 0.68 =0.73 = 26 11 0.26 0.69 1.00 0.80
74
N MMRE Pred(0.25) Strength MPSW
1 1 1
0.19 0.19 0.57
0.80 0.80 0.47
0.33 0.33 1.00
0.66 0.66 0.61
MPSV MPSY
=0.42
=2