de l’information
variables, for instance, on age and sex. Matching ensures that the cases and controls are similar on the
matching variable and therefore this variable cannot be considered a causal factor in the analysis.
Alternatively, one can have an unmatched case-control study and control for confounding effects during
the analysis stage.
In an unmatched case-control study the determination of an association between the exposure (product
metric) and the disease (fault-proneness) proceeds by calculating a measure of association and
determining whether it is significant. For example, consider the following contingency table that is
obtained from a hypothetical validation study:
Fault PronenessCouplingHC
LCFaulty9119Not Faulty1991
Table 1: A contingency table showing the results of a hypothetical validation study.
For this particular data set, the odds ratio is 22.9 (see the appendix, Section 7, for a definition of the odds
ratio), which is highly significant, indicating a strong positive association between coupling and fault-
proneness.
2.2.2 The Potential Confounding Effect of Size
One important element that has been ignored in previous validation studies is the potential confounding
effect of class size. This is illustrated in Figure 2.
Figure 2: Path diagram illustrating the confounding effect of size.
The path diagram in Figure 2 depicts a classic text-book example of confounding in a case-control study 14[99][12]. The path (a) represents the current causal beliefs about product metrics being an antecedent
We make the analogy to a case-control study because it provides us with a well tested framework for defining and evaluating
confounding effects, as well as for conducting observational studies from which one can make stronger causal claims (if all known
confounders are controlled). However, for the sole purposes of this paper, the characteristics of a confounding effect have been
described and exemplified in [61] without resort to a case-control analogy.14