The Confounding Effect of Class Size on The Validity of Obje(3)

2021-04-05 08:35

de l’information

Early prediction is commonly cast as a binary classification problem. This is achieved through a quality

model that classifies components into either a high or low risk category. The definition of a high risk

component varies depending on the context of the study. For example, a high risk component is one that

contains any faults found during testing [14][75], one that contains any faults found during operation [72],

or one that is costly to correct after an error has been found [3][13][1]. The identification of high risk

components allows an organization to take mitigating actions, such as focus defect detection activities on

high risk components, for example optimally allocating testing resources [56], or redesign components

that are likely to cause field failures or be costly to maintain. This is motivated by evidence showing that

most faults are found in only a few of a system’s components [86][51][67][91].

A number of organizations have integrated quality models and modeling techniques into their overall

quality decision making process. For example, Lyu et al. [81] report on a prototype system to support

developers with software quality models, and the EMERALD system is reportedly routinely used for risk

assessment at Nortel [62][63]. Ebert and Liedtke describe the application of quality models to control the

quality of switching software at Alcatel [46].

The construction of design and programming guidelines can proceed by first showing that there is a

relationship between say a coupling metric and maintenance cost. Then proscriptions on the maximum

allowable value on that coupling metric are defined in order to avoid costly rework and maintenance in the4future. Examples of cases where guidelines were empirically constructed are [1][3]. Guidelines based

on anecdotal experience have also been defined [80], and experience-based guidelines are used directly

in the context of software product acquisition by Bell Canada [34].

Concordant with the popularity of the object-oriented paradigm, there has been a concerted research

effort to develop object oriented product metrics [8][17][30][80][78][27][24][60][106], and to validate them

[4][27][17][19][22][78][32][57][89][106][8][25][10]. For example, in [8] the relationship between a set of

new polymorphism metrics and fault-proneness is investigated. A study of the relationship between

various design and source code measures using a data set from student systems was reported in

[4][17][22][18], and a validation study of a large set of object-oriented metrics on an industrial system was

described in [19]. Another industrial study is described in [27] where the authors investigate the

relationship between object-oriented design metrics and two dependent variables: the number of defects

and size in LOC. Li and Henry [78] report an analysis where they related object-oriented design and code

metrics to the extent of code change, which they use as a surrogate for maintenance effort. Chidamber

et al. [32] describe an exploratory analysis where they investigate the relationship between object-

oriented metrics and productivity, rework effort and design effort on three different financial systems

respectively. Tang et al. [106] investigate the relationship between a set of object-oriented metrics and

faults found in three systems. Nesi and Querci [89] construct regression models to predict class

development effort using a set of new metrics. Finally, Harrison et al. [57] propose a new object-oriented

coupling metric, and compare its performance with a more established coupling metric.

Despite minor inconsistencies in some of the results, a reading of the object-oriented metrics validation

literature would suggest that a number of metrics are indeed ‘validated’ in that they are strongly

associated with outcomes of interest (e.g., fault-proneness) and that they can serve as good predictors of

high-risk classes. The former is of course a precursor for the latter. For example, it has been stated that

some metrics (namely the Chidamber and Kemerer – henceforth CK – metrics of [30]) “have been proven

empirically to be useful for the prediction of fault-prone modules” [106]. A recent review of the literature

stated that “Existing data suggests that there are important relationships between structural attributes and

external quality indicators” [23].

However, almost all of the validation studies that have been performed thus far completely ignore the

potential confounding impact of class size. This is the case because the analyses employed are

univariate: they only model the relationship between the product metric and the dependent variable of

interest. For example, recent studies used the bivariate correlation between object-oriented metrics and3

It is not, however, always the case that binary classifiers are used. For example, there have been studies that predict the number

of faults in individual components (e.g., [69]), and that produce point estimates of maintenance effort (e.g., [78][66]).

It should be noted that the construction of guidelines requires the demonstration of a causal relationship rather than a mere

association.43

共19页:

The Confounding Effect of Class Size on The Validity of Obje(3).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档