de l’information
In constructing our models, we could follow the previous literature and not consider interaction effects nor
consider any transformations (for example, see [4][8][17][18][19][22][106]). To err on the conservative
side, however, we did test for interaction effects between the size metric and the product metric for all
product metrics evaluated. In none of the cases was a significant interaction effect identified.19Furthermore, we performed a logarithmic transformation on our variables and re-evaluated all the20models. Our conclusions would not be affected by using the transformed models. Therefore, we only
present the detailed results for the untransformed model.
The magnitude of an association can be expressed in terms of the change in odds ratio as the x1 variable
changes by one standard deviation. This is explained in the appendix (Section 7), and is denoted by
Ψ. Since we construct two models as shown in Eqn. 2 and Eqn. 3 without and with controlling for size
respectively, we will denote the change in odds ratio as Ψx1 and Ψx1+x2 respectively. As suggested
in [74], we can evaluate the extent to which the change in odds ratio changes as an indication of theextent of confounding. We operationalize this as follows:
ψ=2 ψx1 ψx1+x2
ψx1+x2×100Eqn. 4
This gives the percent change in Ψx1+x2 by removing the size confounder. If this value is large then we
can consider that class size does indeed have a confounding effect. The definition of “large” can be
problematic, however, as will be seen in the results, the changes are sufficiently big in our study that by
any reasonable threshold, there is little doubt.
3.3.3 Diagnostics and Hypothesis Testing
The appendix of this paper presents the details of the model diagnostics that were performed, and the
approach to hypothesis testing. Here we summarize these.
The diagnostics concerned checking for collinearity and identifying influential observations. We compute
the condition number specific to logistic regression, ηLR, to determine whether dependencies amongst
the independent variables are affecting the stability of the model (collinearity). The β value provides us
an indication of which observations are overly influential. For hypothesis testing, we use the likelihood
ratio statistic, G, to test the significance of the overall model, the Wald statistic to test for the significance2of individual model parameters, and the Hosmer and Lemeshow R value as a measure of goodness of
fit. Note that for the univariate model the G statistic and the Wald test are statistically equivalent, but we
present them both for completeness. All statistical tests were performed at an alpha level of 0.05.
4 Results
4.1 Descriptive Statistics
Box and whisker plots for all the product metrics that we collected are shown in Figure 4. These indicatethth21the median, the 25 and 75 quantiles. Outliers and extreme points are also shown in the figure.
As is typical with product metrics their distributions are clearly heavy tailed. Most of the variables are
counts, and therefore their minimal value is zero. Variables NOC, NMO, and SIX have less than six
observations that are non-zero. Therefore, they were excluded from further analysis. This is the
approach followed in [22].19
20
21 Given that product metrics are counts, an appropriate transformation to stablize the variance would be the logarithm. We wish to thank an anonymous reviewer for making this suggestion. As will be noted that in some cases the minimal value is zero. For metrics such as CBO, WMC and RFC, this would be because
the class was defined in a manner similar to a C struct, with no methods associated with it.