The Confounding Effect of Class Size on The Validity of Obje(16)

2021-04-05 08:35

de l’information

In constructing our models, we could follow the previous literature and not consider interaction effects nor

consider any transformations (for example, see [4][8][17][18][19][22][106]). To err on the conservative

side, however, we did test for interaction effects between the size metric and the product metric for all

product metrics evaluated. In none of the cases was a significant interaction effect identified.19Furthermore, we performed a logarithmic transformation on our variables and re-evaluated all the20models. Our conclusions would not be affected by using the transformed models. Therefore, we only

present the detailed results for the untransformed model.

The magnitude of an association can be expressed in terms of the change in odds ratio as the x1 variable

changes by one standard deviation. This is explained in the appendix (Section 7), and is denoted by

Ψ. Since we construct two models as shown in Eqn. 2 and Eqn. 3 without and with controlling for size

respectively, we will denote the change in odds ratio as Ψx1 and Ψx1+x2 respectively. As suggested

in [74], we can evaluate the extent to which the change in odds ratio changes as an indication of theextent of confounding. We operationalize this as follows:

ψ=2 ψx1 ψx1+x2

ψx1+x2×100Eqn. 4

This gives the percent change in Ψx1+x2 by removing the size confounder. If this value is large then we

can consider that class size does indeed have a confounding effect. The definition of “large” can be

problematic, however, as will be seen in the results, the changes are sufficiently big in our study that by

any reasonable threshold, there is little doubt.

3.3.3 Diagnostics and Hypothesis Testing

The appendix of this paper presents the details of the model diagnostics that were performed, and the

approach to hypothesis testing. Here we summarize these.

The diagnostics concerned checking for collinearity and identifying influential observations. We compute

the condition number specific to logistic regression, ηLR, to determine whether dependencies amongst

the independent variables are affecting the stability of the model (collinearity). The β value provides us

an indication of which observations are overly influential. For hypothesis testing, we use the likelihood

ratio statistic, G, to test the significance of the overall model, the Wald statistic to test for the significance2of individual model parameters, and the Hosmer and Lemeshow R value as a measure of goodness of

fit. Note that for the univariate model the G statistic and the Wald test are statistically equivalent, but we

present them both for completeness. All statistical tests were performed at an alpha level of 0.05.

4 Results

4.1 Descriptive Statistics

Box and whisker plots for all the product metrics that we collected are shown in Figure 4. These indicatethth21the median, the 25 and 75 quantiles. Outliers and extreme points are also shown in the figure.

As is typical with product metrics their distributions are clearly heavy tailed. Most of the variables are

counts, and therefore their minimal value is zero. Variables NOC, NMO, and SIX have less than six

observations that are non-zero. Therefore, they were excluded from further analysis. This is the

approach followed in [22].19

21 Given that product metrics are counts, an appropriate transformation to stablize the variance would be the logarithm. We wish to thank an anonymous reviewer for making this suggestion. As will be noted that in some cases the minimal value is zero. For metrics such as CBO, WMC and RFC, this would be because

the class was defined in a manner similar to a C struct, with no methods associated with it.

共19页:

The Confounding Effect of Class Size on The Validity of Obje(16).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档