Toward Large-Scale Information Retrieval Using Latent Semant(19)

2020-12-24 20:52

I am deeply indebted to Dr. Michael Berry, my major advisor, for his kind guidance and support. I also thank Dr. Susan Dumais, director of the Information Sciences Research Group at Bellcore, for her technical advice. In addition, she graciously allowed us

comparedtotheotherarraysinthespace.Sincedocumentssharingsimilarnotionswouldpresumablylieneareachotherinthenotion-space,asimilaritymeasurecouldbeusedto nddocumentsrelevanttothequery.Theresultsofthesearchcouldthenbeusedtobetterspecifythequery,allowingbetterretrievalperformance.

Toaidinde ningthenotionsforadocumentcollection,Luhnsuggestedtwodictionarieswereneeded.The rstdictionarywouldcontainalistingofthenotionsusedtode nethedocuments’positioninthespacealongwiththeircorrespondingindexnumbers.Theseconddictionarywouldcontainanalphabeticallistingofthewordsworthindexinginthecollectionandthenotionstowhichthewordsbelonged.Byexaminingthetwodictionaries,thearraysusedtorepresentqueriesandnewdocumentscouldbeconstructedautomatically.

AlthoughLuhn’sproposedvector-spacerepresentationlackedmanyimportantimplementationdetails,itprovidedanintuitiveexplanationofthepurposeforthevector-spacemodelandlaidthefoundationforlaterimplementationsandimprove-ments.Modern,morecomplexretrievalsystemsareatleastpartiallybasedontheideaspresentedbyLuhninthe1950’s.However,itisinterestingtonotethatsomeoftheissuesheraisedover40yearsago(forexample,howtoovercomethedif cultiesofsynonomyandpolysemy)havenotyetbeentotallyresolved.

LSIwasdevelopedtosolvemanyoftheinformationretrievalproblemsLuhnan-ticipatedinthe1950’s.TheLSImodelwillbediscussedinSection2.3.

2.2.2BorkoandBernickonReduced-SpaceDocumentClassi cationAshorttimeafterLuhn’sideaswerepublished,H.BorkoandM.Bernick[BB63]pre-sentedamethodbywhichdocumentscouldautomaticallybeclassi edintoprede nedcategories.Althoughdocumentclassi cationhasdifferentgoalsthaninformationre-trieval,BorkoandBernick’sapproachtodocumentclassi cationcanbeviewedasaspecialcaseofinformationretrieval.Likeallvector-spaceapproaches,BorkoandBer-nickassumedthetermsinadocumentwereafairlyreliableindicatorofthesemanticcontentofthedocument.Theyhopeddocumentsbelongingtothesameclassi cation

10


Toward Large-Scale Information Retrieval Using Latent Semant(19).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:垃圾收集路线设计

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: