I. Ontology-based Information Retrieval(4)

2021-04-06 05:55

Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce

Figure 1. An ontology based document retrieval

3. EXPERIMENTS

3.1. DOCUMENT COLLECTION

Collection named Cystic Fibrosis was used for our experiments. This collection consists of 1239 files [1]. It is a subset extracted from a large MEDLINE collection where a keyword Cystic Fibrosis was used. The minimal size of a file is 0.12 kb, maximum size is 3.8 kb and average size is 1.045 kb.

A file with 100 queries is also supplied with the document collection. A set of relevant documents is given to each query is known. Each document in the answer set is ranked with respect to its relevance to the query by more experts - and can take values from 0 to 8 – see Table 1. In our experiments a document has been taken into account as relevant to a query, if its average experts ranking was more than 4.

It is possible to see at this collection as a group of documents and concepts of ontology, where every document is assigned to an appropriate set of concepts and similarly every concept can “hold” some documents. There are 821 concepts and average number of concepts assigned to a document is 2.8. Similarly we can refer to concepts collection by the same way. Average number of documents assigned to one concept is 4.2.

Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce

Name of collection Relevance

Cystic fibrosis Min num. of Max num. of Average num. documents documents of documents 3 1 131 17,95 4 1 121 15,59

5 1 114 13,5

6 1 96 11,03

Table 1 Cystic fibrosis document collection

3.2. COMPARISON OF VARIOUS APPROACHES FOR DOCUMENT RETRIEVAL

In this section we will describe comparison of document retrieval experiments, where 3 different approaches were used: full text search (vector representation approach), latent semantic indexing approach, and finally ontology-based approach. First approach was used as described above with lower document frequency threshold equal to 0.2% and upper threshold set to 80%, i.e. only terms with documents frequency from the given interval have been taken into account for index. Threshold for LSI dimension reduction was set to 100.

Figure 2. Precision-recall curve for three analyzed retrieval approaches.

Precision-recall curve for all of the approaches described above are presented in Figure 2. Our experiments showed that the Webocrat-like approach based on a ontology is very promising, providing better retrieval efficiency than LSI or standard full text approach. However, as mentioned above, manual assignment of concepts to query has been used.

4. CONCLUSIONS

In this paper we have presented results of some experiments performed in order to evaluate retrieval efficiency of an ontology-based approach, which is implemented within the Webocrat system. We did a series of experiments with two other, frequently used techniques for information retrieval (vector model with tf-idf weight schema and latent semantic indexing model). The experiments on well-known Cystic Fibrosis document collection have shown that ontology-based approach employed in the Webocrat system is very promising and may yield better precision-recall characteristics.

However, there are still open questions related to this approach. Probably the major one is the question how to transform a user-defined query into a set of concepts from actual ontology. In our


I. Ontology-based Information Retrieval(4).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:接龙小学教师2011年暑期培训学习活动总结

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: