thanSRP-LSH,whichveri esthecorrectnessofCorollary3tosomeextent.Furthermore,Figure2showsthatevenwhenθa,b∈(π/2,π],SBLSHstillhasasmallervariance.
2.3Discussion
FromCorollary1,SBLSHprovidesanunbiasedestimateofangularsimilarity.FromCorollary3,whenθa,b∈(0,π/2],withthesamelengthofbinarycode,thevarianceofSBLSHisstrictlysmallerthanSRP-LSH.Inrealapplications,manyvectorrepresentationsarelimitedinnon-negativeorthantwithallvectorentriesbeingnon-negative,e.g.,bag-of-wordsrepresentationofdocumentsandimages,andhistogram-basedrepresentationslikeSIFTlocaldescriptor[18].Usuallytheyarenormalizedtounitlength,withonlytheirorientationsmaintained.Forthiskindofdata,theangleofanytwodifferentsamplesislimitedin(0,π/2],andthusSBLSHwillprovidemoreaccurateestimationthanSRP-LSHonsuchdata.Infact,ourlaterexperimentsshowthatevenwhenθa,bisnotconstrainedin(0,π/2],SBLSHstillgivesmuchmoreaccurateestimateofangularsimilarity.3ExperimentalResults
Weconducttwosetsofexperiments,angularsimilarityestimationandapproximatenearestneighbor(ANN)retrieval,toevaluatetheeffectivenessofourproposedSBLSHmethod.Inthe rstsetofexperimentswedirectlymeasuretheaccuracyinestimatingpairwiseangularsimilarity.ThesecondsetofexperimentsthentesttheperformanceofSBLSHinrealretrievalapplications.
3.1AngularSimilarityEstimation
Inthisexperiment,weevaluatetheaccuracyofestimatingpairwiseangularsimilarityonseveraldatasets.Speci cally,wetesttheeffecttotheestimationaccuracywhentheSuper-BitdepthNvariesandthecodelengthKis xed,andviceversa.ForeachpreprocesseddataD,wegetDLSHafterperformingSRP-LSH,andgetDSBLSHafterperformingtheproposedSBLSH.WecomputetheanglesbetweeneachpairofsamplesinD,thecorrespondingHammingdistancesinDLSHandDSBLSH.WecomputethemeansquarederrorbetweenthetrueangleandtheapproximatedanglesfromDLSHandDSBLSHrespectively.NotethataftercomputingtheHammingdistance,wedividetheresultbyC=K/πandgettheapproximatedangle.
3.1.1DatasetsandPreprocessing
Weconducttheexperimentonthefollowingdatasets:
1)PhotoTourismpatchdataset1[24],NotreDame,whichcontains104,106patches,eachofwhichisrepresentedbya128DSIFTdescriptor(PhotoTourismSIFT);and2)MIR-Flickr2,whichcon-tains25,000images,eachofwhichisrepresentedbya3125Dbag-of-SIFT-featurehistogram;
Foreachdataset,wefurtherconductasimplepreprocessingstepasin[12],i.e.mean-centeringeachdatasample,soastoobtainadditionalmean-centeredversionsoftheabovedatasets,PhotoTourismSIFT(mean),andMIR-Flickr(mean).Theexperimentonthesemean-centereddatasetswilltesttheperformanceofSBLSHwhentheanglesofdatapairsarenotconstrainedin(0,π/2].
3.1.2TheEffectofSuper-BitDepthNandCodeLengthK
Ineachdataset,foreach(N,K)pair,i.e.Super-BitdepthNandcodelengthK,werandomlysample10,000data,whichinvolveabout50,000,000datapairs,andrandomlygenerateSRP-LSHfunctions,togetherwithSBLSHfunctionsbyorthogonalizingthegeneratedSRPinbatches.Werepeatthetestfor10times,andcomputethemeansquarederror(MSE)oftheestimation.
TotesttheeffectofSuper-BitdepthN,we xK=120forPhotoTourismSIFTandK=3000forMIR-Flickrrespectively,andtotesttheeffectofcodelengthK,we xN=120forPhotoTourismSIFTandN=3000forMIR-Flickr.Werepeattheexperimentonthemean-centeredversionsofthesedatasets,anddenotethemethodsbyMean+SRP-LSHandMean+SBLSHrespectively.1
2http://phototour.cs.washington.edu/patches/default.htmhttp://users.ecs.soton.ac.uk/jsh2/mirflickr/