Challenges on Software Defect Analysis in Smart ... - Semantic Scholar

Report 1 Downloads 14 Views
Challenges on Software Defect Analysis in Smart Grid Applications Mohsen Anvaari

Daniela S. Cruzes

Reidar Conradi

Department of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway {mohsena,dcruzes,conradi}@idi.ntnu.no Abstract—Smart Grid software applications are a kind of ultra-large-scale system (ULSS) where complexity has a profound impact on their quality and defect profiles. Their complexity also adds challenges to the process of designing studies to investigate their complicated software development. In this paper we propose an empirical research agenda to study the relationship between the characteristics of Smart Grid software applications as a ULSS and their software defect profile. We base our discussion on a structured literature review and on an ongoing case study in a software company. Future studies are needed on certain characteristics of Smart Grid software applications that affect their defect profile. For this purpose, not only the software development companies but also the grid utilities should be studied. Keywords-Smart Grid Applications, Ultra-Large-Scale Systems, Software Defect Analysis, Defect Profile, Software Evolution

I.

INTRODUCTION

Smart Grid refers to the application of information and communication technology to evolve the traditional electric power grid to a two-way digitized grid [1]. Such a grid employs different software applications at different domains from generation to transmission, distribution and consumption. Although Smart Grid software applications would be single monolithic software by themselves, when they function together in the whole grid they become a kind of ultra-large-scale system (ULSS) [1]. ULSS have the following characteristics [24]: • Decentralized control and distributed development; • Diverse conflicting requirements that come from decentralized stakeholders; • Continued evolution rather than staged evolution; • Heterogeneous and changing elements; • Normal failures and break-downs; • People are part of a system rather than just being users. The complexities of ULSS may change the defect profile (number, density, proneness, type and severity of defects) as known for traditional systems. Knowledge about specific characteristics of Smart Grid software applications that affect their defect profile would be helpful to develop methods and guidelines to decrease software application defects and increase Smart Grid software reliability. Software defect correlational studies are common in software engineering discipline. They indicate whether there is a relationship between software attributes (such as size or complexity) and software defect profile. By exploring the literature of software engineering, we show which characteristics of the systems have been more frequently

studied and which ones can be used to perform a study in the Smart Grid context. We found out that several variables that are relevant to the ULSS characteristics have not been properly covered, so the existent defect analysis studies cannot be generalized to Smart Grid software applications. The main goal of this paper is to propose an empirical research agenda to study the possible relationship between the characteristics of Smart Grid software applications as a ULSS and their software defects profile. By proposing a research agenda we also describe the challenges we have been facing in designing such study in a software company that develops software for Smart Grid. This paper is organized as follows. Section 2 presents the result of a literature review on correlational studies in software defects analysis. Section 3 presents a research agenda to conduct an empirical study on software defect analysis in Smart Grid applications. Finally, Section 4 includes the conclusions and final remarks. II.

SOFTWARE DEFECT ANALYSIS IN LITERATURE

We performed a search on journal papers published since 2000 within computer science, engineering and automation control systems areas through the ISI web of knowledge website. The search term was software AND defect in the title, abstract and keywords and we retrieved 550 papers to verify if they had correlational studies in software defects. Fig. 1 presents the process of the review and the number of papers found at each stage. In stages 2 and 3 we excluded the papers based on titles and abstracts. After reading the full text of the papers from stage 3 we removed 11 papers that were not relevant. From the remaining papers we handsearched the references of the papers (snowballing) looking for potential journal articles of interest. With this we added 11 more papers to the list of papers. We ended up with 22 empirical papers that have conducted correlational studies in software defect analysis. Search ISI Web of Knowledge

Stage 2!

Exclude base on titles

!

Stage 3!

!

Stage 4!

!

Stage 5!

!

n = 550!

!

Stage 1!

!

n = 56!

!

!

Exclude base on full text

!

!

n =! 22!

!

n =! 11!

Exclude base on abstracts !

Manually add the relevant references of the identified papers

! !

!

Figure 1. Stages of review process

n =! 22!

! !

TABLE I.

SUMMARY OF LITERATURE REVIEW ON CORRELATIONAL STUDIES IN SOFTWARE DEFECT ANALYSIS

Defect Variable Software Variable Design/Architecture Attributes Size Code Complexity Reuse Level Code Changes Product Age No. Developers Work Dependencyb Evolution Type Process Compliancec

No. of Defects a

[6][14] [20] [23][25][26] [11][14][21] [11][14]

Defect Proneness

Defect Density

[2][4][8][9] [3][18] [16][23][27] [11][16] [11]

[14][19] [13][19] [13][19]

Defect Defect Severity Type

[1]

[3][22] [22] [12][15] [15] [22]

[15]

[4] [17] [22]

a. Bold font indicates the studies conducted on large software (around or more than 1 MLOC) b. Work dependency indicates the dependency among code modification tasks [4]. c. Process compliance is defined as “the degree to which a documented process is followed in a development project” [22].

TABLE I. indicates which paper has studied the relationship between which software variable versus which defect profile characteristic. Looking at software variables that have been studied shows that those variables that are related to the main characteristics of ULSS are not covered. Although some of the studied variables such as number of developers, product age or work dependency are related to ULSS characteristics, they are not in the main focus of the studies (only 14% have studied product age, number of developers, or work dependency). The table also shows that which studies have been conducted in large software (in bold). While one-third of the researches have studied the defect profile of large software, they have explored these systems as single stand-alone software and not as a system part of a ULSS. It means that although some of the studied systems are a part of a ULSS, but in these papers they are classified as large systems only because of their large size, not because they are distributed, contain heterogeneous elements or due to other characteristics of ULSS. One example is the study by Fenton and Ohlsson [11] where the authors have studied empirical data from two releases of a large legacy project developing telecommunication switching systems. The development is done at more than 20 design centers sited in more than 10 countries [11]. Therefore, the studied system is a part of a ULSS, but the variables they have studied are size and code complexity and they didn’t study any of the other characteristics that are related to ULSSs. The last point to discuss about the reviewed literature is that all of them have studied the software defects at the development phase. Smart Grid software applications are built to interoperate with other software applications, and many of the software defects are detected as more instances of the Smart Grid software are integrated with other software systems. This puzzle of systems of systems requires that the defects reported from the customers should also be considered due to interoperability issues [5, 7, 10].

III.

CHALLENGES ON SOFTWARE DEFECT ANALYSIS IN SMART GRID

To perform studies in the evolution of Smart Grid software applications we first need to understand whether the characteristics and dimensions (consistency, distribution, etc.) of ULSS will affect the defect profiles of these systems in comparison with single systems. We therefore propose a research agenda for software defect analysis in Smart Grid focusing on “studying Smart Grid software applications as ULSS in order to explore how increasing software dimensions affects software defects profile in such systems.” In the current stage, more specifically the main research question for our research group is: “How does the large-scale dimensions of Smart Grid software applications affect their defect profile in terms of defect density, defect resolution time, defect type and defect severity?” In this context we would investigate differences in the defects profile correlated to: • Centralized software vs. distributed software • Homogeneous software vs. heterogeneous software • Software with few stakeholders vs. software with several stakeholders • Software with stable requirements vs. software with evolving requirements • Software interconnected with few other software vs. software interconnected with several other software To categorize the software systems of a Smart Grid project into each of theses dimensions a criteria should be defined. To define such criteria, a pilot study should be conducted in a Smart Grid company and/or a grid utility that are going to be the cases of the study. We have negotiated with a Norwegian company that is a leading software supplier for the energy and public sectors in Norway and is developing several software products for Smart Grid. Three hundred out of 429 Norwegian municipalities are using at least one of the products of this company. For the first phase of the collaboration they have given us the access to the defects database of two of their products. One product is a large legacy software that is developed and maintained since 1985 and the other one is under development for power grid operation. In order to investigate our research question we are now in the phase of establishing measures for each dimension. We are executing this phase based on our previous experience and also on the literature. As we proceed in this phase we have identified some ways to perform the study but also we have been facing some challenges in the definition and operationalization of these metrics. Fig. 2 shows the relationship between the following dimensions and ULSS characteristics. Distribution: The scale of ULSS will allow only limited possibilities for centralized control of data, development, evolution, and operation [24]. Therefore, in Smart Grid software applications as a ULSS, not only the software development and evolution but also the software operation and usage is decentralized. It means that not only the

Number of different development sites for a product, number of grid operators using a product

Decentralized control and distributed development

Distribution

Heterogeneous and changing elements

Consistency

Programming language, type of software (embedded, enterprise, …)

Diverse conflicting requirements that come from decentralized stakeholders

Software interconnections

Number of direct/indirect interconnections with other software

Continued evolution rather than staged evolution

Requirement stability

Number and type of requirements during development, rate of code changes during evolution

Erosion of system/people boundary

Number and type of stakeholders

ULSS Characteristics

Dimensions

Number of stakeholders, type of stakeholders Sample of metrics

Figure 2. Relationship between ULSS characteristics and our proposed dimensions and metrics

development and maintenance of software applications for Smart Grid is distributed, but also the developed applications are used and operated in a decentralized way. In our case study, the company has development sites in several cities in Scandinavia. Furthermore when they deliver their products to the market, different Norwegian municipalities use their products in a decentralized manner and therefore the defects reported from the customers are coming from distributed resources. So the measures for distribution dimension that may affect the software defect profile should be developed in both development aspect and operation aspect (See Fig. 2). Consistency: The elements of ULSS are heterogeneous, inconsistent, and changing because they come from a variety of sources. Parts of the system are written in different languages, tuned for different hardware/software platforms. Many software elements will originate in legacy systems; written long before the first ULSS comes into existence [24]. In the case of our ongoing study, their legacy product that is developed and maintained for 25 years has several modules written in different language such as Fortran, C, C++, C#. From the grid operation point of view, municipalities are integrating different types of products from different companies. Therefore the measures for studying the effect of consistency on software defect profile should be developed in different aspects such as programming language and type of products (embedded, enterprise etc.). To analyze the software defect profile in this context, the data should be gathered from both software development companies (e.g. to understand the effect of products with inconsistent programming languages) and grid utilities (e.g. to explore the effect of integrating inconsistent types of software). Requirements Stability: In Smart Grid, the software applications are developed and run by a large number of companies in different locations. Such applications are in a never-ending state of flux because of changing expectations from the direct and indirect users of software-driven artifacts. Accordingly, diverse conflicting requirements are one of the ULSS characteristics that may affect software

defect profile in Smart Grid. To measure such an impact, different aspects should be considered. Examples are the number and type of requirements during the development and the rate of code changes due to the coming new requirements during the enhancive evolution (Fig. 2). In the case of our study, the mentioned software that is used for 25 years is a product to design the grid of an area including the grid components and their connections. By considering the diverse grid technology that is changing very fast, the functionalities of the product should be upgraded constantly. The requests to enhance the functionalities are coming from different customers that are Norwegian municipalities. So the requirements diversity is a clear characteristic for this product and its impact on the software defect profile should be studied. One of the challenges to conduct the study is the lack of formal requirements specifications. Software Interconnections: Interoperability is one of the challenges in developing ULSS due to their heterogeneous elements. In comparison with some ULSS like telecommunication systems, interoperability is even a more challenging issue in Smart Grid because still there is not a worldwide standard in this area [5]. A possible scenario to explain this issue, based on our interviews in our case study, is that the neighbor municipalities in Norway buy and integrate their Smart Grid software applications from different vendors. So, when they launch the whole puzzle including neighbor municipalities with different software packages, some software defects reported are because of interoperability problems. To explore this possible impact the measured number of software interconnections for each software application should be considered. It means that each software application is directly/indirectly related to a certain number of other software applications (for example in our case, one product imported the maps that are exported from the other product). When the municipalities integrate and launch different software applications together, there are different applications with different number of interconnections. This may affect the defect profile of those that interact with many others. To study this dimension, a challenge is to have access to the database of the municipalities, as in our case study.

Number/Type of Stakeholders: Measuring the impact of number and type of stakeholders on software defect profile in Smart Grid is important because of two characteristics of ULSS: the constant evolution of requirements (as discussed above) and the erosion of system/people boundary. The latter means that in ULSS the consumers are not only users of the system, but a part of the system and contribute in the system activities. In Smart Grid, by considering the substantial role of plug-in hybrid electric vehicles and solar panels, people are a part of the system [1]. In this scenario, besides the software defects recognized in the software development companies and the software defects reported by the utilities, there will be software defects reported from the consumers. In our case, the company has a customer center that receives the defects reported from customers and if they consider it as a software defect they insert it to their defect database. The bigger the number of and types of stakeholders, more complicated is the evolution of the product and that will probably affect its defect profile. The challenge to study this dimension is to understand the exact number and type of stakeholders for each of their products due to lack of documentation. IV.

[5] [6] [7]

[8] [9] [10]

[11] [12] [13] [14]

CONCLUSION

To understand whether the ULSS characteristics of Smart Grid software applications impact their quality, there is a need to conduct correlational studies of software defect analysis in Smart Grid context. Such studies determine whether there is a relationship between the dimensions of Smart Grid applications (distribution, consistency, interconnections, requirements stability and number/type of stakeholders) and the software defect profile (density, resolution time, severity and type). In this paper we discussed the importance and challenges of doing such a research and proposed some measures to conduct it. Furthermore, we discussed the importance of studying not only the software development companies but also the grid utilities (that integrate the software applications into Smart Grid). Therefore, the main cases of the research should involve one or two companies that develop software for Smart Grid and one or two utilities that integrate Smart Grid software applications. Future work will involve performing case studies to find the answers to the research questions discussed in the paper. The first case study will be based in the Norwegian company described previously.

[15] [16] [17] [18] [19] [20]

[21] [22] [23]

REFERENCES [1]

[2] [3] [4]

Anvaari, M., Conradi, R. and Cruzes, D.S. (2012). Smart Grid Software Applications as an Ultra-Large-Scale System: Challenges for Evolution, Third IEEE PES Conference on Innovative Smart Grid Technologies. Brian, L. C., Melo, W.L and Wuest, J. (2002). Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects, IEEE TSE 28(7): 706-720. Cartwright, M. and Shepperd, M. (2000). An Empirical Investigation of an Object-Oriented Software System, IEEE TSE 26(8): 786-796. Cataldo, M., Mockus, A., Roberts, J.A., Herbsleb, J.D. (2009). Software Dependencies, Work Dependencies, and Their Impact on Failures, IEEE TSE 35(6): 864-878.

[24] [25] [26] [27]

Collier, S.E. (2009). Ten Steps To A Smarter Grid, IEEE Rural Electric Power Conference, REPC '09, pp. B2-B2-7. Eaddy, M. Z., T; Sherwood, K. D.; Garg, V.; Murphy, G. C; Nagappan, N. and Aho, A. V. (2008). Do crosscutting concerns cause defects?, IEEE TSE 34(4): 497-515. Electricity Advisory Committee (2010). Smart Grid: Enabler of the New Energy Economy, US Department of Energy, Available at: www.oe.energy.gov/DocumentsandMedia/final-smart-grid-report.pdf, Last visited date: May 20, 2011. Emam, K. E., Benlarbi, S., Goel, N. and Rai, S. N. (2001). The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics, IEEE TSE 27(7): 630-650. Emam, K. E., Melo, W. and Machado, J.C. (2001). The prediction of faulty classes using object-oriented design metrics, JSS 56(2001): 6375. EU Commission Task Force for Smart Grids (2010). Functionalities of Smart Grids and Smart Meters, Available at: http://ec.europa.eu/energy/gas_electricity/smartgrids/doc/expert_grou p1.pdf, Last visited date: May 20, 2011. Fenton, N.E. and Ohlsson, N. (2000). Quantitative Analysis of Faults and Failures in a Complex Software System, IEEE TSE 26(7): 1-18. Frakes, W.B. and Succi, G. (2001). An industrial study of reuse, quality and productivity, JSS 57(2001): 99-106. Grave, T. L., Karr, A.F., Marron, J.S. and Siy, H. (2000). Predicting Fault Incidence Using Software Change History, IEEE TSE 26(7): 653-661. GunesKoru, A. and Tian, J. (2003). An empirical comparison and characterization of high defect and high complexity modules, JSS 67(3): 153-163. Gupta, A., Li, J,.Conradi, R., Rønneberg, H. and Landre, Einar (2008). A Case Study Comparing Defect Profiles of a Reused Framework and of Applications Reusing It, ESE 14(2): 227-255. Gyimothy, T., Ferenc, R. and Siket, I. (2005). Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction, IEEE TSE 31(10): 897-910. Hale, D. P., Hale, J.E and Smith, R.K. (2011). Evaluation of Work Product Defects during Corrective & Enhancive Software Evolution: A Field Study Comparison, SIGMIS 42(1): 59-73. Hansen, K. M., Jonasson, K. and Neukirchen, H. (2011). An Empirical Study of Software Architectures’ Effect on Product Quality, JSS 84(7): 1233-1243. Illes-Seifert, T. and Paech, B. (2010). Exploring the Relationship of a File’s History and Its Fault-Proneness: An Empirical Method and Its Application to Open Source Programs, IST 52(5): 539-558. Janes, A. Scotto, M. Pedrycz, W. Russo, B. Stefanovic, M. and Succi, G. (2006). Identification of defect-prone classes in telecommunication software systems using design metrics, Information Sciences 176(24): 3711-3734. Koru, A. G., Zhang, D., El Emam, Kh. and Liu, H. (2009). An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules, IEEE TSE 35(2): 293-304. Leszak, M., Perry, D. E. and Stoll, D. (2002). Classification and evaluation of defects in a project retrospective, JSS 61(2002): 173187. Pai, G. J. and Dugan, J. B. (2007). Empirical analysis of software fault content and fault proneness using Bayesian methods, IEEE TSE 33(10): 675-686. Software Engineering Institute, Ultra-Large-Scale Systems: The Software Challenge of the Future, Pittsburg, 2006. Subramanyam, R. and Krishnan, M. S. (2003). Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects, IEEE TSE 29(4): 297-310. Vokac, M. (2004). Defect frequency and design patterns: An empirical study of industrial code, IEEE TSE 30(12): 904-917. Zhou, Y. and Leung, H. (2006). Empirical Analysis of ObjectOriented Design Metrics for Predicting High and Low Severity Faults, IEEE TSE 32(10): 771-789.