beenproposedrecently.[1]proposedanear-optimalLSHmethodforEuclideandistance.Lietal.
[16]proposedb-bitminwisehashwhichimprovestheoriginalmin-hashintermsofcompactness.
[17]showsthatb-bitminwisehashcanbeintegratedinlinearlearningalgorithmsforlarge-scalelearningtasks.[14]reducesthevarianceofrandomprojectionsbytakingadvantageofmarginalnorms,andcomparesthevarianceofSRPwithregularrandomprojectionsconsideringthemargins.
[15]proposedverysparserandomprojectionsforacceleratingrandomprojectionsandSRP.
PriortoSBLSH,SRP-LSH[3]wastheonlyhashingmethodproventoprovideunbiasedestimateofangularsimilarity.TheproposedSBLSHmethodisthe rstdata-independentmethodthatoutper-formsSRP-LSHintermsofhigheraccuracyinestimatingangularsimilarity.
Ontheotherhand,data-dependenthashingmethodshavebeenextensivelystudied.Spectralhashing
[23]isadata-dependentunsupervisedmethodforEuclideandistance.Kulisetal.[13]proposedker-nelizedlocality-sensitivehashing(KLSH),whichisbasedonSRP-LSH,toapproximatetheangularsimilarityinveryhighorevenin nitedimensionalspaceinducedbyanygivenkernel,withaccesstodataonlyviakernels.Therearealsoabunchofworksdevotedtosemi-supervisedorsupervisedhashingmethods[10,19,21,22],whichtrytocapturenotonlythegeometryoftheoriginaldata,butalsothesemanticrelations.
5Discussion
InsteadoftheGram-Schmidtprocess,wecanuseothermethodtoorthogonalizetheprojectionvec-tors,e.g.Householdertransformation,whichisnumericallymorestable.TheadvantageofGram-Schmidtprocessisitssimplicityindescribingthealgorithmandbuildinguptheoreticalguarantees.Inthepaperwedidnottestthemethodondataofveryhighdimension.Whenthedimensionishigh,andNisnotsmall,theGram-Schmidtprocesswillbecomputationallyexpensive.Infact,whenthedimensionofdataisveryhigh,therandomnormalprojectionvectors{vi}i=1,2...,Kwilltendtobeorthogonaltoeachother,thusitmaynotbeverynecessarytoorthogonalizethevectorsdeliberately.FromCorollary2andtheresultsinSection3.1.2,wecanseethattheclosertheSuper-BitdepthNistothedatadimensiond,thelargerthevariancereductionSBLSHachievesoverSRP-LSH.
Atechnicalreport3(Lietal.)showsthatb-bitminwisehashingalmostalwayshasasmallervariancethanSRPinestimatingJaccardcoef cientonbinarydata.ThecomparisonofSBLSHwithb-bitminwisehashingforJaccardcoef cientisleftforfuturework.
6ConclusionandFutureWork
TheproposedSBLSHisadata-independenthashingmethodwhichsigni cantlyoutperformsSRP-LSH.WehavetheoreticallyprovedthatSBLSHprovidesanunbiasedestimateofangularsimilarity,andhasasmallervariancethanSRP-LSHwhentheangletoestimateisin(0,π/2].Thealgorithmissimple,easytoimplementandcanbeintegratedasabasicmoduleinmorecomplicatedalgo-rithms.Experimentsshowthatwiththesamelengthofbinarycode,SBLSHachievesover30%meansquarederrorreductionoverSRP-LSHinestimatingangularsimilarity,whentheSuper-BitdepthNisclosetothedatadimensiond.Moreover,SBLSHperformsbestamongseveralwidelyuseddata-independentLSHmethodsinapproximatenearestneighborretrievalexperiments.Theo-reticallyexploringthevarianceofSBLSHwhentheangleisin(π/2,π]isleftforfuturework.Acknowledgments
ThisworkwassupportedbytheNationalBasicResearchProgram(973Program)ofChina(GrantNos.2013CB329403and2012CB316301),NationalNaturalScienceFoundationofChina(GrantNos.91120011and61273023),andTsinghuaUniversityInitiativeScienti cResearchProgramNo.20121088071,andNExTResearchCenterfundedundertheresearchgrantWBS.R-252-300-001-490byMDA,Singapore.AnditwassupportedinparttoDr.QiTianbyAROgrantW911BF-12-1-0057,NSFIIS1052851,FacultyResearchAwardsbyGoogle,FXPAL,andNECLaboratoriesofAmerica,respectively.
3www.stat.cornell.edu/ li/hashing/RP_minwise.pdf