Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce
where S is the diagonal matrix of singular values and U,Vare matrices of left and right singular vectors. If the singular values in S are ordered by size, the firstklargest values may be kept and the remaining smaller ones are set to zero. The product of the resulting matrices is a matrix approximately equal to A, and is closest to A in the least squares sense.
TA ASVD where ASVD=UKSKVK
In order to determine similarity between a query and approximate document vector Di,SVD, we need to transform query vector to new feature space. (Original query vector is computed with tf-idf scheme as described above for vector model approach.)
T 1QSVD=QTF IDFUKSK
and then we can compute similarity in the same way as before, i.e.
simSVD(QSVD,Di,SVD)=Di,SVD×QSVD
Di,SVDQSVD.
2.3. ONTOLOGY-BASED APPROACH
This part describes the Webocrat-like approach that uses ontology for document retrieval purposes. For the experiments described below we did not consider type of relation in ontology for calculation of similarity between concepts. Moreover, we assumed that the set of relevant concepts to the query is known. But this condition can be achieved with any technique for assigning concepts from ontology to a query, e.g. based on manual assignment or based on synonyms to query terms, making use of Wordnet or other.
The way in which a query is processed by this approach is shown on the Figure 1. For a given query first appropriated concepts are retrieved - in our case manually from the user. Then the set of concepts associated with each document is retrieved from database. As next, these two sets are compared using simple metric, which expresses the similarity between a document Di and given query Q.
Qcon∪Di,conifQcon∪Di,con≠0
simonto(Q,Di)=
k
where Qcon is a set of concepts assigned to query Q and Dcon is a set of concepts assigned to document Di, and k is small constant, e.g. 0.1. Resulted number represents ontology-based similarity measure. Better results have been achieved when this number have been combined with some of the previous two retrieval approaches described above (i.e. LSI approach or vector model). The final similarity is then computed as multiplication, e.g.
sim(Q,Di)=simonto(Q,Di) simTF IDF(Q,Di)