Preprocessing of the two ontologies, OmniClass and IfcXML, is necessary at the beginning stage. The entity concept
terms of both ontologies are extracted. Unique ID and suffix of the concepts are removed and duplicated concepts are discarded. The entire preprocessed concept terms of OmniClass and IfcXML are latched to each section of the International Building Codes (IBC) XML files. The concept tags,
In the example showed on the right, the concepts \from OmniClass and the concepts \to the same section 2209.2 of IBC. It implies that they may be potentially related in some aspects. Further confirmation of their relatedness can be deduced by considering their co-occurrence in other sections. Relatedness Analysis
The number of co-occurred sections of the two concepts and the number of times the two concepts are matched to each of these sections reveal the semantic similarity between the two concepts. Three relatedness analysis measures have been used for concept comparison between OmniClass and IfcXML. They
are cosine similarity measure, Jaccard similarity coefficient, and market basket model. Cosine similarity is a measure of similarity between two vectors of n
dimensions by finding the angle between them. Jaccard similarity coefficient is a statistical measure, using set theory, of the extent of overlapping of two vectors in n dimensions compared to union. Market basket model is a probabilistic data-mining technique to find item-item correlation.
Although in most circumstances related concepts can be captured by treating each section as an independent dimension in concept co-occurrence comparison, some
related concepts rarely co-occur in the same sections. Examples are Is-A-related concepts (e.g. \same scope (e.g. \considered in order to capture those related but not co-occurred concepts. Results
Besides identical concepts such as \\conventional term matching techniques, for instance, \
OmniClass and \analysis. In the test case, market basket model outperforms other two
relatedness analysis approaches in terms of root mean square error (RMSE) as well as F-measure, a combination of precision and recall rate. In fact, the market basket model shows the highest recall rate and a moderately high precision.
Evaluation results of the three measures using RMSE
Evaluation results of the three measures using F-Measure
Contributions
This research proposes a new approach to compare and map heterogeneous ontologies, so as to achieve interoperability between data models. It enables information exchange and sharing among project stakeholders. Once the mapping between ontologies is completed, updating and consistency checking of data can also be allowed although the data sources are using different data models.