Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce
Ontology-based Information Retrieval
Jan Paralic
Department of Cybernetics and AI, Technical University of Kosice,
Letna 9, 040 11 Kosice, Slovakia
jan.paralic@tuke.sk
Ivan Kostial
Department of Cybernetics and AI, Technical University of Kosice,
Letna 9, 040 11 Kosice, Slovakia
ivan.kostial@tuke.sk
Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to concepts from this ontology. In such a way resources may be retrieved based on the associations and not only based on partial or exact term matching as the use of vector model presumes In order to evaluate the quality of this retrieval mechanism, experiments to measure retrieval efficiency have been performed with well-known Cystic Fibrosis collection of medical scientific papers. The ontology-based retrieval mechanism has been compared with traditional full text search based on vector IR model as well as with the Latent Semantic Indexing method..
Keywords: information retrieval, ontology-based retrieval, vector representation, latent semantic indexing,
1. INTRODUCTION
A considerable amount of explicit knowledge is scattered throughout various documents within organizations and people minds working there. In many cases the possibility to efficiently access (retrieve) and reuse this knowledge is limited [3]. As a result of this, most knowledge is not sufficiently exploited, shared and subsequently forgotten in relatively short time after it has been introduced to, invented/discovered within the organization. Therefore, in the approaching information society, it is vitally important for knowledge-intensive organizations to make the best use of information gathered from various information resources inside the organizations and from external sources like the Internet. On the other hand, tacit knowledge of authors of the documents’ provides important context to them, which cannot be effectively intercepted.
Knowledge management [7] generally deals with several activities relevant in knowledge life cycle [1]: identification, acquisition, development, dissemination (sharing), use and preservation of organization’s knowledge. Our approach to knowledge management in the e-Government context supports most of the activities mentioned above. Based on this approach, a Web-based system Webocrat1 [6] has been designed and implemented. It is being now tested on pilot applications at Wolverhampton (UK) and in Ko ice (Slovakia). Firstly, it provides tools for capturing and updating of tacit knowledge connected with particular explicit knowledge inside documents. This is possible due to ontology model, which is used for representation of organization’s domain knowledge. Ontology with syntax and semantic rules provides the 'language' by which Webocrat(-like) systems can interact at the knowledge level [5].
Use of an ontology enables to define concepts and relations representing knowledge about a particular document in domain specific terms. In order to express the contents of a document explicitly, it is necessary to create links (associations) between the document and relevant parts of a domain model, i.e. links to those elements of the domain model, which are relevant to the contents of the document. 1 EC funded project IST-1999-20364 Webocracy (Web Technologies Supporting Direct Participation in Democratic Processes)