Predicting object-oriented software maintainability ... - Semantic Scholar

Report 5 Downloads 170 Views
Predicting object-oriented software maintainability using multivariate adaptive regression splines Yuming Zhou, Hareton Leung The Journal of Systems and Software, 2006

2007. 10.10 Chanhee Yi

Contents § § § § § § § §

Introduction Background Related work Multivariate Adaptive Regression Splines(MARS) model Model evaluation Case study Conclusion Discussion

2007-10-10

Software Engineering Lab, KAIST

2/19

Introduction §

Motivation ü

Prediction to software maintainability is very important •

ü

Various aid to both developers and managers

Lack of a technique with high prediction accuracy •

Power of MARS for building prediction models Ø Ø



§

Adapt the unkown functional form Demonstrated in many applications

Little works with MARS in software engineering

Research goal ü

Investigation into the applicability of MARS • •

To object-oriented software maintainability prediction With object-oriented software metrics

2007-10-10

Software Engineering Lab, KAIST

3/19

Related work §

About issues for the prediction with OO metrics ü

OO metrics as predictors of maintenance effort Chidamber and Kemerer, [TSE’94]

WMC, DIT, RFC, NOC, and LCOM

Li and Henry, [JSS’93]

MPC, DAC, NOM, SIZE2

à

ü

Uses 10 metrics (= above 9 + SIZE1)

Way to measure the maintainability of a S/W system Jeffery et al., [IST’00]

Time required to make changes

Emam et al., [JSS’01]

Time to understand, develop, and implement modifications

à à

Number of changes to lines in the code during a maintenance period Represented with a metric “CHANGE”

2007-10-10

Software Engineering Lab, KAIST

4/19

MARS model(1/4) §

Representation ü

Weighted combination of basis functions

ü

Basis function

2007-10-10

Software Engineering Lab, KAIST

5/19

MARS model(2/4) §

Construction ü

Forward stepwise selection process • •

ü

Starts with the constant basis function Adds the basis function which has a good GCV value

Backward pruning process •

Eliminates basis functions from the model which has the least GCV

2007-10-10

Software Engineering Lab, KAIST

6/19

MARS model(3/4) §

Construction(cont’d)

y = cGCV(1,1) 0 + c1(a-x1)

y =GCV(1,2) c0 + c1’(b-x1)

y = c0 + c1’’(Metric1-x 1) + c2(r-x2) GCV(2,1)

y = c0 + c1’’’(Metric1-x1) y = c0 + c1’’(Metric1-x 1) + c2’(s-x2) GCV(2,2)

Data1 Data2Data3 Metric1

a

b

Metric2

r

s

y = c0 + c1’’(Metric1-x1) + c2’’’(Metric2-x1)

Data n-1

Metric10 CHANGE(=y) : Value of a metric n : Total number of class in a data set

2007-10-10

Software Engineering Lab, KAIST

7/19

MARS model(4/4) §

Example of construction

[ Distribution of data ]

2007-10-10

Software Engineering Lab, KAIST

8/19

Model evaluation(1/2) §

Prediction accuracy ü

Absolute Residual Error(ARE) 로 만들기. SumARE : 의미, 이런 식으로.. Type• 표 Significance

ü

ü

SumARE

Measure total residuals over the data

MedARE

Measure the central tendency of the ARE distribution

SDARE

Measure the dispersion of the ARE distribution

Magnitude of Relative Error(MRE) MaxMRE

Measure the maximum relative discrepancy

MMRE

De facto standard as an accuracy measure

Pred(0.25) and Pred(0.30) •

Identifies if a model is acceptable

2007-10-10

Software Engineering Lab, KAIST

9/19

Model evaluation(2/2) §

Cross validation ü ü

Enables to get the predictive power of a model Uses Leave-one-out(LOO) in this paper •

§

Good enough to get an estimate as accurate as possible

Wilcoxon signed-rank test as a significance test ü

Free from the distribution of data < Two null and alternative hypothesis >

AREMARS

AREX

Difference

Rank

0.9

0.8

0.1

1

1.2

0.9

0.3

2.5

0.3

0.7

-0.4

4

1.4

1.1

0.3

2.5

Z+ = 1 + 2.5 + 2.5 = 6, 2007-10-10

Z- = 4,

Z = min(Z+, Z-) = 4

Software Engineering Lab, KAIST

10/19

Case study(1/7) §

Object-Oriented metrics

CHANGE(Number of lines changed in the class)

Insertion and deletion = 1, change of contents = 2

DIT(Depth of Inheritance Tree)

Length of the longest path from the root to a class

DAC(Data Abstraction Coupling)

Number of abstract data types defined in a class

LCOM(Lack of COhesion in Methods)

Number of pairs of methods using no attribute in common

MPC(Message-Passing Coupling)

Number of send statements defined in a class

NOC(Number Of Children)

Number of classes that directly inherit from a class

NOM(Number Of Methods)

Number of methods implemented within a class

RFC(Response For a Class)

Number of methods executed by a received message

WMC(Weighted Methods / Class)

McCabe’s cyclometric complexity of a class

SIZE1(Lines of code)

Number of semicolons in a class

SIZE2(Number of properties)

Number of attributes and local methods in a class

2007-10-10

Software Engineering Lab, KAIST

11/19

Case study(2/7) §

Data set description ü

User Interface Management System(UIMS) data set [ 39 classes from the UIMS ]

ü

QUality Evaluation System(QUES) data set [ 71 classes from the QUES ]

2007-10-10

Software Engineering Lab, KAIST

12/19

Case study(3/7) §

Data set description(cont’d) ü

Correlation between the metrics of two data sets •

Pearson’s correlation coefficient Ø



Indicate the strength of a relationship between two variables

Results of analysis

2007-10-10

Software Engineering Lab, KAIST

13/19

Case study(4/7) §

Results from UIMS data set

[ Variable importance ]

ß [ Residual boxplots ]

2007-10-10

Software Engineering Lab, KAIST

14/19

Case study(5/7) §

Results from UIMS data set(cont’d) ü

Results about the prediction accuracy

ü

Results of Wilcoxon signed-rank test

2007-10-10

Software Engineering Lab, KAIST

15/19

Case study(6/7) §

Results from QUES data set

[ Variable importance ]

ß [ Residual boxplots ]

2007-10-10

Software Engineering Lab, KAIST

16/19

Case study(7/7) §

Results from QUES data set(cont’d) ü

Results about the prediction accuracy

ü

Results of Wilcoxon signed-rank test

2007-10-10

Software Engineering Lab, KAIST

17/19

Conclusion §

Contribution ü

Proposes another prediction model •

ü

Showed that MARS can be useful for maintainability prediction •

§

MARS is very suitable for modeling complex relationships Various empirical results were presented

Future work ü

Replicates this study across programming languages •

ü

It allows them to further investigate the capability of MARS

Develop more accurate prediction models •

They will combine MARS with other prediction techniques

2007-10-10

Software Engineering Lab, KAIST

18/19

Discussion §

Characteristic ü

Supports the persuasive power of this approach well •

By referencing many papers Ø



§

Defines the coverage of this paper obviously

By comparing the results with other methods

Limitation ü ü

Collected data only from systems implemented with Ada Result with UIMS which don’t so much outperformed

2007-10-10

Software Engineering Lab, KAIST

19/19