de l’information
the number of faults to investigate the validity of the metrics [57][10]. Also, univariate logistic regression
models are used as the basis for demonstrating the relationship between object-oriented product metrics
and fault-proneness in [22][19][106]. The importance of controlling for potential confounders in empirical
studies of object-oriented products has been emphasized [23]. However, size, the most obvious potential
confounder, has not been controlled in previous validation studies.
The objective of this paper is to investigate the confounding effect of class size on the validation of object-
oriented product metrics. We first demonstrate based on previous work that there is potentially a size
confounding effect in object-oriented metrics validation studies, and present a methodology for empirically
testing this. We then perform an empirical study on an object-oriented telecommunications framework5written in C++ [102]. The metrics we investigate consist of the CK metrics suite [30], and some of the
metrics defined by Lorenz and Kidd [80]. The external metric that we validate against is the occurrence of
a fault, which we term the fault-proneness of the class. In our study a fault is detected due to a field
failure.
Briefly, our results indicate that by using the commonly employed univariate analyses our results are
consistent with previous studies. After controlling for the confounding effect of class size, none of the
metrics is associated with fault-proneness. This indicates a strong confounding effect of class size on
some common object-oriented metrics. The results cast serious doubt that many previous validation
studies demonstrate more than that size is associated with fault-proneness.
Perhaps the most important practical implication of these results is that design and programming
guidelines based on previous validation studies are questioned. Efforts to control cost and quality using
object-oriented metrics as early indicators of problems may be achieved just as well using early indicators
of size. The implications for research are that data from previous validation studies should be re-
examined to gauge the impact of the size confounding effect, and future validation studies should control
for size.
In Section 2 we provide the rationale behind the confounding effect of class size and present a framework
for its empirical investigation. Section 3 presents our research method, and Section 4 includes the results
of the study. We conclude the paper in Section 5 with a summary and directions for future work.
2 Background
This section is divided in two parts. First, we present the theoretical and empirical basis of the object-
oriented metrics that we attempt to validate. Second, we demonstrate that there is a potentially strong
size confounding effect in object-oriented metrics validation studies.
2.1 Theoretical and Empirical Basis of Object-Oriented Metrics
2.1.1 Theoretical Basis and Its Empirical Support
The primary reason why there is an interest in the development of product metrics in general is
exemplified by the following justification for a product metric validity study “There is a clear intuitive basis
for believing that complex programs have more faults in them than simple programs” [87]. However, an
intuitive belief does not make a theory. In fact, the lack of a strong theoretical basis driving the
development of traditional software product metrics has been criticized in the past [68]. Specifically,
Kearney et al. [68] state that “One of the reasons that the development of software complexity measures
is so difficult is that programming behaviors are poorly understood. A behavior must be understood before
what makes it difficult can be determined. To clearly state what is to be measured, we need a theory of
programming that includes models of the program, the programmer, the programming environment, and
the programming task.” It has been stated that for historical reasons the CK metrics are the most referenced [23]. Most commercial metrics collection tools
available at the time of writing also collect these metrics.5