A 201.4 GOPS real-time multi-object recognitionprocessor is presented with a three-stage pipelined architecture.Visual perception based multi-object recognition algorithm isapplied to give multiple attentions to multiple objects in the inputimage. For human-like multi-object perception, a neural perceptionengine is proposed with biologically inspired neural networksand fuzzy logic circ
32IEEEJOURNALOFSOLID-STATECIRCUITS,VOL.45,NO.1,JANUARY2010
A201.4GOPS496mWReal-TimeMulti-ObjectRecognitionProcessorWithBio-InspiredNeural
PerceptionEngine
Joo-YoungKim,StudentMember,IEEE,MinsuKim,StudentMember,IEEE,SeungjinLee,StudentMember,IEEE,
JinwookOh,StudentMember,IEEE,KwanhoKim,StudentMember,IEEE,andHoi-JunYoo,Fellow,IEEE
Abstract—A201.4GOPSreal-timemulti-objectrecognitionprocessorispresentedwithathree-stagepipelinedarchitecture.Visualperceptionbasedmulti-objectrecognitionalgorithmisappliedtogivemultipleattentionstomultipleobjectsintheinputimage.Forhuman-likemulti-objectperception,aneuralpercep-tionengineisproposedwithbiologicallyinspiredneuralnetworksandfuzzylogiccircuits.Intheproposedhardwarearchitecture,threerecognitiontasks(visualperception,descriptorgeneration,andobjectdecision)aredirectlymappedtotheneuralperceptionengine,16SIMDprocessorsincluding128processingelements,anddecisionprocessor,respectively,andexecutedinthepipelinetomaximizethroughputoftheobjectrecognition.Foref cienttaskpipelining,proposedtask/powermanagerbalancestheexecutiontimesofthethreestagesbasedonintelligentworkloadestimations.Inaddition,a118.4GB/smulti-castingnetwork-on-chipispro-posedforcommunicationarchitecturewithincorporatingoverall21IPblocks.Forlow-powerobjectrecognition,workload-awaredynamicpowermanagementisperformedinchip-level.The49mm2chipisfabricatedina
0.13m8-metalCMOSprocessandcontains3.7Mgatesand396KBon-chipSRAM.Itachieves60frame/secmulti-objectrecognitionupto10differentobjectsforVGA
(640480)videoinputwhiledissipating496mWat1.2V.Theobtained8.2mJ/frameenergyef ciencyis3.2timeshigherthanthestate-of-the-artrecognitionprocessor.
IndexTerms—Multi-castingnetwork-on-chip,multimediapro-cessor,multi-objectrecognition,neuralperceptionengine,visualperception,workload-awaredynamicpowermanagement,three-stagepipelinedarchitecture.
I.INTRODUCTION
O
BJECTrecognitionisafundamentaltechnologyforin-telligentvisionapplicationssuchasautonomouscruisecontrol,mobilerobotvision,andsurveillancesystems[1]–[5].Usually,itcontainsnotonlypixelbasedimageprocessingforobjectfeatureextractionbutalsovectordatabasematchingfor nalobjectdecision[6].Forobjectrecognition, rst,variousscalespacesaregeneratedbyacascaded lteringforinputvideo
ManuscriptreceivedMay04,2009;revisedJuly22,2009andSeptember01,2009.CurrentversionpublishedDecember23,2009.ThispaperwasapprovedbyGuestEditorKazutamiArimoto.
TheauthorsarewiththeDepartmentofElectricalEngineeringandComputerScience,KoreaAdvancedInstituteofScienceandTechnology,Daejeon305-701,Korea(e-mail:trample7@eeinfo.kaist.ac.kr).
Colorversionsofoneormoreofthe guresinthispaperareavailableonlineathttp://www.77cn.com.cn.
DigitalObjectIdenti er10.1109/JSSC.2009.2031768
stream.Then,key-pointsareextractedamongneighborscalespacesbylocalmaxima/minimasearch,http://www.77cn.com.cnst,the nalrecognitionismadebynearestneighbormatchingwithpre-de nedobjectdatabasethatgener-allyincludesovertenthousandsofobjectdescriptorvectors.Sinceeachstageoftheobjectrecognitionrequireshugeamountofcomputations,itsreal-timeoperationishardtobeachievedwithasinglegeneralpurposeCPU[3].Toachievereal-timeperformanceover20frame/secwithlowpowercon-sumptionunder1W,manymulti-corebasedvisionprocessorshavebeendeveloped[1]–[5].Inmassivelyparallelsingleinstructionmultipledata(SIMD)processors[1],[2],hundredsofprocessingelements(PEs)ofareemployedtomaximizedata-levelparallelismforper-pixelimageoperationssuchasimage lteringandhistogram.However,theiridenticaloper-ationsarenotsuitableforkey-pointorobjectleveloperationssuchasdescriptorvectorgenerationanddatabasematching.Ontheotherhand,themulti-coreprocessorof[3]exploitscoarse-grainedPEsandmemory-centricnetwork-on-chip(NoC)fortask-levelparallelismoverdata-levelparallelism;however,itcannotprovideenoughcomputingpowerforreal-timeobjectrecognitionduetoitsdatasynchronizationoverhead.Unlikethepreviousprocessors,aNoCbasedparallelprocessor[4]adoptsavisualattentionengine(VAE)[7]toreducethecomputationalcomplexityoftheobjectrecognition.Motivatedfromhumanvisualsystem,theVAEselectsmean-ingfulkey-pointsoutoftheextractedonestogiveattentionstothembeforethemainobjectrecognitionprocessingaforemen-tioned.Althoughitreducestheexecutiontimeofthewholeobjectrecognition,however,itsperformanceisstilllimitedbecauseitsvisualattention,objectfeatureextractionandde-scriptorgeneration,anddatabasematchingareperformedinseriesintimedomainduetotheirunbalancedworkloads.
Inthiswork,weproposeareal-timelow-powermulti-objectrecognitionprocessorwithathree-stagepipelinedarchitecture.Thepreviousvisualattentionisenhancedtovisualperceptiontogivemultipleattentionstomultipleobjectsintheinputimage.Forhuman-likemulti-objectperception,neuralperceptionen-gineisproposedwithbiologicallyinspiredneuralnetworksandfuzzylogiccircuits.Intheproposedprocessor,athree-stagepipelinedarchitectureisproposedtomaximizethethroughputofobjectrecognition.Thementionedthreeobjectrecognitiontasksarepipelinedinframelevelandtheirexecutiontimesarebalancedbasedonintelligentworkloadestimationstoimprove
0018-9200/$26.00©2009IEEE