Predicting object-oriented software maintainability using multivariate adaptive regression splines Yuming Zhou, Hareton Leung The Journal of Systems and Software, 2006
2007. 10.10 Chanhee Yi
Contents § § § § § § § §
Introduction Background Related work Multivariate Adaptive Regression Splines(MARS) model Model evaluation Case study Conclusion Discussion
2007-10-10
Software Engineering Lab, KAIST
2/19
Introduction §
Motivation ü
Prediction to software maintainability is very important •
ü
Various aid to both developers and managers
Lack of a technique with high prediction accuracy •
Power of MARS for building prediction models Ø Ø
•
§
Adapt the unkown functional form Demonstrated in many applications
Little works with MARS in software engineering
Research goal ü
Investigation into the applicability of MARS • •
To object-oriented software maintainability prediction With object-oriented software metrics
2007-10-10
Software Engineering Lab, KAIST
3/19
Related work §
About issues for the prediction with OO metrics ü
OO metrics as predictors of maintenance effort Chidamber and Kemerer, [TSE’94]
WMC, DIT, RFC, NOC, and LCOM
Li and Henry, [JSS’93]
MPC, DAC, NOM, SIZE2
à
ü
Uses 10 metrics (= above 9 + SIZE1)
Way to measure the maintainability of a S/W system Jeffery et al., [IST’00]
Time required to make changes
Emam et al., [JSS’01]
Time to understand, develop, and implement modifications
à à
Number of changes to lines in the code during a maintenance period Represented with a metric “CHANGE”
2007-10-10
Software Engineering Lab, KAIST
4/19
MARS model(1/4) §
Representation ü
Weighted combination of basis functions
ü
Basis function
2007-10-10
Software Engineering Lab, KAIST
5/19
MARS model(2/4) §
Construction ü
Forward stepwise selection process • •
ü
Starts with the constant basis function Adds the basis function which has a good GCV value
Backward pruning process •
Eliminates basis functions from the model which has the least GCV
2007-10-10
Software Engineering Lab, KAIST
6/19
MARS model(3/4) §
Construction(cont’d)
y = cGCV(1,1) 0 + c1(a-x1)
y =GCV(1,2) c0 + c1’(b-x1)
y = c0 + c1’’(Metric1-x 1) + c2(r-x2) GCV(2,1)
y = c0 + c1’’’(Metric1-x1) y = c0 + c1’’(Metric1-x 1) + c2’(s-x2) GCV(2,2)
Data1 Data2Data3 Metric1
a
b
Metric2
r
s
y = c0 + c1’’(Metric1-x1) + c2’’’(Metric2-x1)
Data n-1
Metric10 CHANGE(=y) : Value of a metric n : Total number of class in a data set
2007-10-10
Software Engineering Lab, KAIST
7/19
MARS model(4/4) §
Example of construction
[ Distribution of data ]
2007-10-10
Software Engineering Lab, KAIST
8/19
Model evaluation(1/2) §
Prediction accuracy ü
Absolute Residual Error(ARE) 로 만들기. SumARE : 의미, 이런 식으로.. Type• 표 Significance
ü
ü
SumARE
Measure total residuals over the data
MedARE
Measure the central tendency of the ARE distribution
SDARE
Measure the dispersion of the ARE distribution
Magnitude of Relative Error(MRE) MaxMRE
Measure the maximum relative discrepancy
MMRE
De facto standard as an accuracy measure
Pred(0.25) and Pred(0.30) •
Identifies if a model is acceptable
2007-10-10
Software Engineering Lab, KAIST
9/19
Model evaluation(2/2) §
Cross validation ü ü
Enables to get the predictive power of a model Uses Leave-one-out(LOO) in this paper •
§
Good enough to get an estimate as accurate as possible
Wilcoxon signed-rank test as a significance test ü
Free from the distribution of data < Two null and alternative hypothesis >
AREMARS
AREX
Difference
Rank
0.9
0.8
0.1
1
1.2
0.9
0.3
2.5
0.3
0.7
-0.4
4
1.4
1.1
0.3
2.5
Z+ = 1 + 2.5 + 2.5 = 6, 2007-10-10
Z- = 4,
Z = min(Z+, Z-) = 4
Software Engineering Lab, KAIST
10/19
Case study(1/7) §
Object-Oriented metrics
CHANGE(Number of lines changed in the class)
Insertion and deletion = 1, change of contents = 2
DIT(Depth of Inheritance Tree)
Length of the longest path from the root to a class
DAC(Data Abstraction Coupling)
Number of abstract data types defined in a class
LCOM(Lack of COhesion in Methods)
Number of pairs of methods using no attribute in common
MPC(Message-Passing Coupling)
Number of send statements defined in a class
NOC(Number Of Children)
Number of classes that directly inherit from a class
NOM(Number Of Methods)
Number of methods implemented within a class
RFC(Response For a Class)
Number of methods executed by a received message
WMC(Weighted Methods / Class)
McCabe’s cyclometric complexity of a class
SIZE1(Lines of code)
Number of semicolons in a class
SIZE2(Number of properties)
Number of attributes and local methods in a class
2007-10-10
Software Engineering Lab, KAIST
11/19
Case study(2/7) §
Data set description ü
User Interface Management System(UIMS) data set [ 39 classes from the UIMS ]
ü
QUality Evaluation System(QUES) data set [ 71 classes from the QUES ]
2007-10-10
Software Engineering Lab, KAIST
12/19
Case study(3/7) §
Data set description(cont’d) ü
Correlation between the metrics of two data sets •
Pearson’s correlation coefficient Ø
•
Indicate the strength of a relationship between two variables
Results of analysis
2007-10-10
Software Engineering Lab, KAIST
13/19
Case study(4/7) §
Results from UIMS data set
[ Variable importance ]
ß [ Residual boxplots ]
2007-10-10
Software Engineering Lab, KAIST
14/19
Case study(5/7) §
Results from UIMS data set(cont’d) ü
Results about the prediction accuracy
ü
Results of Wilcoxon signed-rank test
2007-10-10
Software Engineering Lab, KAIST
15/19
Case study(6/7) §
Results from QUES data set
[ Variable importance ]
ß [ Residual boxplots ]
2007-10-10
Software Engineering Lab, KAIST
16/19
Case study(7/7) §
Results from QUES data set(cont’d) ü
Results about the prediction accuracy
ü
Results of Wilcoxon signed-rank test
2007-10-10
Software Engineering Lab, KAIST
17/19
Conclusion §
Contribution ü
Proposes another prediction model •
ü
Showed that MARS can be useful for maintainability prediction •
§
MARS is very suitable for modeling complex relationships Various empirical results were presented
Future work ü
Replicates this study across programming languages •
ü
It allows them to further investigate the capability of MARS
Develop more accurate prediction models •
They will combine MARS with other prediction techniques
2007-10-10
Software Engineering Lab, KAIST
18/19
Discussion §
Characteristic ü
Supports the persuasive power of this approach well •
By referencing many papers Ø
•
§
Defines the coverage of this paper obviously
By comparing the results with other methods
Limitation ü ü
Collected data only from systems implemented with Ada Result with UIMS which don’t so much outperformed
2007-10-10
Software Engineering Lab, KAIST
19/19