v12
v4v9
3
8w12
51110
istheGram-Schmidtprocess,whichprojectsthecurrentvectororthogonallyontotheorthogonalcomplementofthesubspacespannedbythepreviousvectors.Afterorthogonalization,theseNrandomvectorscannolongerbeviewedasindependentlysampled,thuswegrouptheirresultingbitstogetherasanN-Super-Bit.WecallNtheSuper-Bitdepth.
However,whenthecodelengthK>d,itisimpossibletoorthogonalizeallKvectors.AssumethatK=N×Lwithoutlossofgenerality,and1≤N≤d,thenwecanperformtheGram-SchmidtprocesstoorthogonalizetheminLbatches.Formally,Krandomvectors{v1,v2...,vK}areindependentlysampledfromthenormaldistributionN(0,Id),andthendividedintoLbatcheswithNvectorseach.ByperformingtheGram-SchmidtprocesstotheseLbatchesofNvectorsrespectively,wegetK=N×Lprojectionvectors{w1,w2...,wK}.ThisresultsinKSBLSHTfunctions(hw1,hw2...,hwK),wherehwi(x)=sgn(wix).TheseKfunctionsproduceLN-Super-BitsandaltogetherproducebinarycodesoflengthK.Figure1showsanexampleofgenerating12SBLSHprojectionvectors.Algorithm1liststhealgorithmforgeneratingSBLSHprojectionvectors.NotethatwhentheSuper-BitdepthN=1,SBLSHbecomesSRP-LSH.Inotherwords,SRP-LSHisaspecialcaseofSBLSH.ThealgorithmcanbeeasilyextendedtothecasewhenthecodelengthKisnotamultipleoftheSuper-BitdepthN.InfactonecanevenusevariableSuper-BitdepthNiaslongas1≤Ni≤d.Alsonotethatthenormalizationof{wi}tounitnormisnotnecessary,sincethe 2normofaprojectionvectorwidoesnotaffecttheresultofhwi(·).Withthesamecodelength,SBLSHhasthesamerunningtimeO(Kd)asSRP-LSHinon-lineprocessing,i.e.generatingbinarycodeswhenapplyingto
data.
Figure1:Anillustrationof12SBLSHprojectionvectors{wi}generatedbyorthogonalizing{vi}in4batches.
Algorithm1GeneratingSuper-BitLocality-SensitiveHashingProjectionVectors
resultingcodelengthK=N×L.
GeneratearandommatrixHwitheachelementsampledindependentlyfromthenormaldistribu-tionN(0,1),witheachcolumnnormalizedtounitlength.DenoteH=[v1,v2,...,vK].
fori=0toL 1do
forj=1toNdo
wiN+j=viN+j.
fork=1toj 1doTwiN+j=wiN+j viN+kviN+kviN+j.
endfor
endfor
endfor =[w1,w2,...,wK].Output:H
2.1UnbiasedEstimate
InthissubsectionweprovethatSBLSHprovidesanunbiasedestimateofθa,bofa,b∈Rd.
Lemma1.([8],Lemma3.2)LetSd 1denotetheunitsphereinRd.GivenarandomvectorvuniformlysampledfromSd 1,wehavePr[hv(a)=hv(b)]=θa,b/π.
Lemma2.Ifv∈Rdfollowsanisotropicdistribution,thenv¯=v/ v isuniformlydistributedond 1S.