de l’information
depth per se is not the factor that affects understandability, but the number of methods that have to be
traced.
2.1.1.3 Summary
The current theoretical framework for explaining the effect of the structural properties of object-oriented
programs on external program attributes can be justified empirically. To be specific, studies that have
been performed indicate that the distribution of functionality across classes in object-oriented systems,
and the exacerbation of this through inheritance, potentially makes programs more difficult to understand.
This suggests that highly cohesive, sparsely coupled, and low inheritance programs are less likely to
contain a fault. Therefore, metrics that measure these three dimensions of an object-oriented program
would be expected to be good predictors of fault-proneness or the number of faults.
The empirical question is then whether contemporary object-oriented metrics measure the relevant
structural properties well enough to substantiate the above theory. Below we review the evidence on this.
2.1.2 Empirical Validation of Object-Oriented Metrics
In this section we review the empirical studies that investigate the relationship between the ten object-
oriented metrics that we study and fault-proneness (or number of faults). The product metrics cover the
following dimensions: coupling, cohesion, inheritance, and complexity. These dimensions are based on
the definition of the metrics, and may not reflect their actual behavior.
Coupling metrics characterize the static usage dependencies amongst the classes in an object-oriented
system [21]. Cohesion metrics characterize the extent to which the methods and attributes of a class
belong together [16]. Inheritance metrics characterize the structure of the inheritance hierarchy.
Complexity metrics, as used here, are adaptations of traditional procedural paradigm complexity metrics
to the object-oriented paradigm.
Current methodological approaches for the validation of object-oriented product metrics are best
exemplified by two articles by Briand et al. [19][22]. These are validation studies for an industrial
communications system and a set of student systems respectively, where a considerable number of
contemporary object-oriented product metrics were studied. We single out these studies because their
methodological reporting is detailed and because they reflect what can be considered best
methodological practice to date.
The basic approach starts with a data set of product metrics and binary fault data for a complete system
or multiple systems. The important element of the Briand et al. methodology that is of interest to us here
is the univariate analysis that they stipulate should be performed. In fact, the main association between
the product metrics and fault-proneness is established on the basis of the univariate analysis. If the
relationship is statistically significant (and in the expected direction) than a metric is considered7validated. For instance, in [22] the authors state a series of hypotheses relating each metric with fault-
proneness. They then explain “Univariate logistic regression is performed, for each individual measure
(independent variable), against the dependent variable to determine if the measure is statistically related,
in the expected direction, to fault-proneness. This analysis is conducted to test the hypotheses..”
Subsequently, the results of the univariate analysis are used to evaluate the extent of evidence
supporting each of the hypotheses. Reliance on univariate results as the basis for drawing validity
conclusions is common practice (e.g., see [4][10][17][18][57][106]).
In this review we first present the definition of the metrics as we have operationalized them. The
operationalization of some of the metrics is programming language dependent. We then present the
magnitude of the coefficients and p values computed in the various studies. Validation coefficients were
either the change in odds ratio as a measure of the magnitude of the metric to fault-proneness
association from a logistic regression (see the appendix, Section 7) or the Spearman correlation
coefficient. Finally, this review focuses only on the fault-proneness or number of faults dependent
variable. Other studies that investigated effort, such as [32][89][78], are not covered as effort is not the
topic of the current paper.7 Briand et al. use logistic regression, and consider the statistical significance of the regression parameters.