The Confounding Effect of Class Size on The Validity of Obje(15)

2021-04-05 08:35

de l’information

A total of 192 faults were detected in the framework at the time of writing. These faults occurred in 70 out

of 174 classes. The dichotomous dependent variable that we used in our study was the detection or non-

detection of a fault. If one or more faults are detected then the class is considered to be faulty, and if not

then it is considered not faulty.

3.3 Data Analysis Methods

3.3.1 Testing for a Confounding Effect

It is tempting to use a simple approach to test for a confounding effect of size: examine the association

between size and fault-proneness. If this association is not significant at a traditional alpha level, then

conclude that size is not different between cases and controls (and hence has no confounding effect),

and proceed with a usual univariate analysis.

However, it has been noted that this is an incorrect approach [38]. The reason is that traditional

significance testing places the burden of proof on rejecting the null hypothesis. This means that one has

to prove that the cases and controls do differ in size. In evaluating confounding potential, the burden of

proof should be in the opposite direction: before discarding the potential for confounding, the researcher

should demonstrate that cases and controls do not differ on size. This means controlling the Type II error

rather than the Type I error. Since one usually has no control over the sample size, this means setting

the alpha level to 0.25, 0.5, or even larger.

A simpler and more parsimonious approach is as follows. For an unmatched case-control study, a

measured confounding variable can be controlled through a regression adjustment [12][99]. A regression

adjustment entails including the confounder as another independent variable in a regression model. If the

regression coefficient of the object-oriented metric changes dramatically (in magnitude and statistical

significance) with and without the size variable, then this is a strong indication that there was indeed a

confounding effect [61]. This is further elaborated below.

3.3.2 Logistic Regression Model

Binary logistic regression is used to construct models when the dependent variable can only take on two

values, as in our case. It is most convenient to use a logistic regression (henceforth LR) model rather

than the contingency table analysis used earlier for illustrations since the model does not require

dichotomization of our product metrics.The general form of an LR model is:

π=

1+e1 β0+βixi i=1 Eqn. 1∑k

where π is the probability of a class having a fault, and the xi’s are the independent variables. The β

parameters are estimated through the (unconditional) maximization of a log-likelihood [61].

In a univariate analysis only one xi,

being validated:18x1, is included in the model, and this is the product metric that is

1+e β0+βix1π=Eqn. 2

When controlling for size, a second xi, x2, is included that measures size:

π=

1811+e β0+βix1+β2x2Eqn. 3 Conditional logistic regression is used when there has been matching in the case-control study and each matched set is treated

as a stratum in the analysis [12].

共19页:

The Confounding Effect of Class Size on The Validity of Obje(15).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档