2012--Super-Bit Locality-Sensitive Hashing(8)

2021-01-20 18:04

beenproposedrecently.[1]proposedanear-optimalLSHmethodforEuclideandistance.Lietal.

[16]proposedb-bitminwisehashwhichimprovestheoriginalmin-hashintermsofcompactness.

[17]showsthatb-bitminwisehashcanbeintegratedinlinearlearningalgorithmsforlarge-scalelearningtasks.[14]reducesthevarianceofrandomprojectionsbytakingadvantageofmarginalnorms,andcomparesthevarianceofSRPwithregularrandomprojectionsconsideringthemargins.

[15]proposedverysparserandomprojectionsforacceleratingrandomprojectionsandSRP.

PriortoSBLSH,SRP-LSH[3]wastheonlyhashingmethodproventoprovideunbiasedestimateofangularsimilarity.TheproposedSBLSHmethodisthe rstdata-independentmethodthatoutper-formsSRP-LSHintermsofhigheraccuracyinestimatingangularsimilarity.

Ontheotherhand,data-dependenthashingmethodshavebeenextensivelystudied.Spectralhashing

[23]isadata-dependentunsupervisedmethodforEuclideandistance.Kulisetal.[13]proposedker-nelizedlocality-sensitivehashing(KLSH),whichisbasedonSRP-LSH,toapproximatetheangularsimilarityinveryhighorevenin nitedimensionalspaceinducedbyanygivenkernel,withaccesstodataonlyviakernels.Therearealsoabunchofworksdevotedtosemi-supervisedorsupervisedhashingmethods[10,19,21,22],whichtrytocapturenotonlythegeometryoftheoriginaldata,butalsothesemanticrelations.

5Discussion

InsteadoftheGram-Schmidtprocess,wecanuseothermethodtoorthogonalizetheprojectionvec-tors,e.g.Householdertransformation,whichisnumericallymorestable.TheadvantageofGram-Schmidtprocessisitssimplicityindescribingthealgorithmandbuildinguptheoreticalguarantees.Inthepaperwedidnottestthemethodondataofveryhighdimension.Whenthedimensionishigh,andNisnotsmall,theGram-Schmidtprocesswillbecomputationallyexpensive.Infact,whenthedimensionofdataisveryhigh,therandomnormalprojectionvectors{vi}i=1,2...,Kwilltendtobeorthogonaltoeachother,thusitmaynotbeverynecessarytoorthogonalizethevectorsdeliberately.FromCorollary2andtheresultsinSection3.1.2,wecanseethattheclosertheSuper-BitdepthNistothedatadimensiond,thelargerthevariancereductionSBLSHachievesoverSRP-LSH.

Atechnicalreport3(Lietal.)showsthatb-bitminwisehashingalmostalwayshasasmallervariancethanSRPinestimatingJaccardcoef cientonbinarydata.ThecomparisonofSBLSHwithb-bitminwisehashingforJaccardcoef cientisleftforfuturework.

6ConclusionandFutureWork

TheproposedSBLSHisadata-independenthashingmethodwhichsigni cantlyoutperformsSRP-LSH.WehavetheoreticallyprovedthatSBLSHprovidesanunbiasedestimateofangularsimilarity,andhasasmallervariancethanSRP-LSHwhentheangletoestimateisin(0,π/2].Thealgorithmissimple,easytoimplementandcanbeintegratedasabasicmoduleinmorecomplicatedalgo-rithms.Experimentsshowthatwiththesamelengthofbinarycode,SBLSHachievesover30%meansquarederrorreductionoverSRP-LSHinestimatingangularsimilarity,whentheSuper-BitdepthNisclosetothedatadimensiond.Moreover,SBLSHperformsbestamongseveralwidelyuseddata-independentLSHmethodsinapproximatenearestneighborretrievalexperiments.Theo-reticallyexploringthevarianceofSBLSHwhentheangleisin(π/2,π]isleftforfuturework.Acknowledgments

ThisworkwassupportedbytheNationalBasicResearchProgram(973Program)ofChina(GrantNos.2013CB329403and2012CB316301),NationalNaturalScienceFoundationofChina(GrantNos.91120011and61273023),andTsinghuaUniversityInitiativeScienti cResearchProgramNo.20121088071,andNExTResearchCenterfundedundertheresearchgrantWBS.R-252-300-001-490byMDA,Singapore.AnditwassupportedinparttoDr.QiTianbyAROgrantW911BF-12-1-0057,NSFIIS1052851,FacultyResearchAwardsbyGoogle,FXPAL,andNECLaboratoriesofAmerica,respectively.

3www.stat.cornell.edu/ li/hashing/RP_minwise.pdf

共9页:

2012--Super-Bit Locality-Sensitive Hashing(8).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档