SRP-LSH SBLSH Mean+SRP-LSH Mean+SBLSH
SRP-LSH SRP-LSH SBLSH SBLSH Mean+SRP-LSH Mean+SRP-LSH Mean+SBLSH Mean+SBLSH
Figure3:TheeffectofSuper-BitdepthN(1<N≤min(d,K))with xedcodelengthK(K=N×L),andtheeffectofcodelengthKwith xedSuper-BitdepthN.
Table1:ANNretrievalresults,measuredbyproportionofgoodneighborswithinquery’sHammingballofradius3.NotethatthecodelengthK=30.
Data
NotreDame
HalfDome
TreviE2LSH0.4675±0.09000.4503±0.07120.4661±0.0849SRP-LSH0.7500±0.05250.7137±0.04130.7591±0.0464SBLSH0.7845±0.03520.7535±0.02760.7891±0.0329
Figure3showsthatwhenusing xedcodelengthK,astheSuper-BitdepthNgetslarger(1<N≤min(d,K)),theMSEofSBLSHgetssmaller,andthegapbetweenSBLSHandSRP-LSHgetslarger.Particularly,whenN=K,over30%MSEreductioncanbeobservedonallthedatasets.Thisveri esCorollary2thatwhenapplyingSBLSH,thebeststrategywouldbetosettheSuper-BitdepthNaslargeaspossible,i.e.min(d,K).Aninformalexplanationtothisinterestingphenomenonisthatasthedegreeoforthogonalityoftherandomprojectionsgetshigher,thecodebecomesmoreandmoreinformative,andthusprovidesbetterestimate.Ontheotherhand,itcanbeobservedthattheperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.Thisshowsthatevenwhentheanglebetweeneachdatapairisnotconstrainedin(0,π/2],SBLSHstillgivesmuchmoreaccurateestimation.
Figure3alsoshowsthatwith xedSuper-BitdepthNSBLSHsigni cantlyoutperformsSRP-LSH.WhenincreasingthecodelengthK,theaccuraciesofSBLSHandSRP-LSHshallbothincrease.Theperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.
3.2ApproximateNearestNeighborRetrieval
Inthissubsection,weconductANNretrievalexperiment,whichcomparesSBLSHwithtwootherwidelyuseddata-independentbinaryLSHmethods:SRP-LSHandE2LSH(weusethebinaryver-sionin[23],theoriginalversionisin[1]).WeusethedatasetsNotreDame,HalfDomeandTrevifromthePhotoTourismpatchdataset[24],whichisalsousedin[12,10,13]forANNretrieval.Weuse128DSIFTrepresentationandnormalizethevectorstounitnorm.Foreachdataset,werandomlypick1,000samplesasqueries,andtherestsamples(around100,000)asthecorpusfortheretrievaltask.Wede nethegoodneighborstoaqueryqasthesampleswithinthetop5%nearestneighbors(measuredinEuclideandistance)toq.Weadopttheevaluationcriteriausedin[12,23],i.e.theproportionofgoodneighborsinreturnedsamplesthatarewithinthequery’sHammingballofradiusr.Wesetr=http://www.77cn.com.cningcodelengthK=30,werepeattheexperimentfor10timesandtakethemeanoftheresults.ForSBLSH,we xtheSuper-BitdepthN=K=30.Table1showsthatSBLSHperformsbestamongallthesedata-independenthashingmethods.
4RelationstoOtherHashingMethods
ThereexistdifferentkindsofLSHmethods,e.g.bit-samplingLSH[9,7]forHammingdistanceand 1-distance,min-hash[2]forJaccardcoef cient,p-stable-distributionLSH[6]for p-distancewhenp∈(0,2].Thesedata-independentmethodsaresimple,thuseasytobeintegratedasamoduleinmorecomplicatedalgorithmsinvolvingpairwisedistanceorsimilaritycomputation,e.g.nearestneighborsearch.Newdata-independentmethodsforimprovingtheseoriginalLSHmethodshave