2012--Super-Bit Locality-Sensitive Hashing

2021-01-20 18:04

Super-BitLocality-SensitiveHashing

JianqiuJi ,JianminLi ,ShuichengYan ,BoZhang ,QiTian

StateKeyLaboratoryofIntelligentTechnologyandSystems,

TsinghuaNationalLaboratoryforInformationScienceandTechnology(TNList),

DepartmentofComputerScienceandTechnology,

TsinghuaUniversity,Beijing100084,China

jijq10@http://www.77cn.com.cn,

{lijianmin,dcszb}@http://www.77cn.com.cn

DepartmentofElectricalandComputerEngineering,

NationalUniversityofSingapore,Singapore,117576

eleyans@nus.edu.sg

DepartmentofComputerScience,UniversityofTexasatSanAntonio,

OneUTSACircle,UniversityofTexasatSanAntonio,SanAntonio,TX78249-1644

qitian@cs.utsa.edu

Abstract

Sign-random-projectionlocality-sensitivehashing(SRP-LSH)isaprobabilisticdimensionreductionmethodwhichprovidesanunbiasedestimateofangularsim-ilarity,yetsuffersfromthelargevarianceofitsestimation.Inthiswork,wepro-posetheSuper-Bitlocality-sensitivehashing(SBLSH).Itiseasytoimplement,whichorthogonalizestherandomprojectionvectorsinbatches,anditistheoreti-callyguaranteedthatSBLSHalsoprovidesanunbiasedestimateofangularsim-ilarity,yetwithasmallervariancewhentheangletoestimateiswithin(0,π/2].Theextensiveexperimentsonrealdatawellvalidatethatgiventhesamelengthofbinarycode,SBLSHmayachievesigni cantmeansquarederrorreductioninestimatingpairwiseangularsimilarity.Moreover,SBLSHshowsthesuperiorityoverSRP-LSHinapproximatenearestneighbor(ANN)retrievalexperiments.1Introduction

Locality-sensitivehashing(LSH)methodaimstohashsimilardatasamplestothesamehashcodewithhighprobability[7,9].ThereexistvariouskindsofLSHforapproximatingdifferentdistancesorsimilarities,e.g.,bit-samplingLSH[9,7]forHammingdistanceand 1-distance,min-hash[2,5]forJaccardcoef cient.AmongthemaresomebinaryLSHschemes,whichgeneratebinarycodes.BinaryLSHapproximatesacertaindistanceorsimilarityoftwodatasamplesbycomputingtheHammingdistancebetweenthecorrespondingcompactbinarycodes.SincecomputingHammingdistanceinvolvesmainlybitwiseoperations,itismuchfasterthandirectlycomputingotherdis-tances,e.g.Euclidean,cosine,whichrequiremanyarithmeticoperations.Ontheotherhand,thestorageissubstantiallyreducedduetotheuseofcompactbinarycodes.Inlarge-scaleapplications

[20,11,5,17],e.g.near-duplicateimagedetection,objectandscenerecognition,etc.,weareoftenconfrontedwiththeintensivecomputingofdistancesorsimilaritiesbetweensamples,thenbinaryLSHmayactasascalablesolution.

1.1Locality-SensitiveHashingforAngularSimilarity

Formanydatarepresentations,thenaturalpairwisesimilarityisonlyrelatedwiththeanglebetweenthedata,e.g.,thenormalizedbag-of-wordsrepresentationfordocuments,images,andvideos,andthenormalizedhistogram-basedlocalfeatureslikeSIFT[18].Inthesecases,angularsimilarity


2012--Super-Bit Locality-Sensitive Hashing.doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:2020年公需科目当代科学技术前沿知识考题及答案6

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: