A 201.4 GOPS real-time multi-object recognitionprocessor is presented with a three-stage pipelined architecture.Visual perception based multi-object recognition algorithm isapplied to give multiple attentions to multiple objects in the inputimage. For human-like multi-object perception, a neural perceptionengine is proposed with biologically inspired neural networksand fuzzy logic circ
KIMetal.:A201.4GOPS496mWREAL-TIMEMULTI-OBJECTRECOGNITIONPROCESSORWITHBIO-INSPIREDNEURALPERCEPTIONENGINE
41
Fig.12.Four-stagepipelinedmulti-castingswitchanditsmulti-castingport.
packet,amulti-castinginputportsendsmultiplerequeststoalldestinationarbitersatthesametimeandwaitsuntilallgrantsignalsarereturned.Tothisend,inthemulti-castinginputport,amulti-portrequesterdecodesthe16-bitRIandgeneratescorrespondingrequestsignalsandagrantcheckerholdsthemulti-castingpacketuntiltheregisteredrequestsignalsareequaltothereceivedgrantsignals.Afterallgrantsaregathered,multi-castingisperformedusingtheexistingbroad-castedwiresofcrossbarfabricwithoutanyadditionalwires.Avari-ablestrengthdriverisspeciallyemployedforthemulti-castingporttoprovidesuf cientdrivingstrengthformulti-casting.Asaresult,theMC-NoC’smulti-castingcapabilityacceleratestheprogramkerneldistributionandimagedatadownloadtaskofthetargetobjectrecognitionby
6.56and
1.22,respectively.
VI.LOW-POWERTECHNIQUES
Toreducepowerconsumptionduringtheobjectrecognitionprocessing,chip-levelpowermanagementisperformedbytheSTM.Fig.13showspowermanagementarchitectureoftheproposedprocessoranditsworkload-awaredynamicpowermanagement.Inthechip,powerdomainofthe16SPUsisdividedintofourdomainsandeachofthemisindependentlycontrolledbytheSTM.Tocontrolthepowerdomains,off-chippowergatingmethod[19]isemployedforlowcostimplemen-tation.Anexternalregulatorwithenablesignalisemployedforeachofthepowerdomains.Therestpartsofthechip,theNPE,STM,DPandNoC,areplacedinalways-ondomain.Foref cientpowergatingofthechip,workload-awarepowergating(WAPG)isadoptedwithworkload-awaretasksched-uling(WATS).WhentheSTMmeasurestheworkloadoftheSPUsbasedonthenumberofROIgrid-tilesanddeterminesthenumberofactivatingSPUs,italsodeterminesthenumberofactivatedpowerdomainsinproportionaltotheworkloadamount,asshowninthe owchartofFig.13.Afterthat,theSTMsendsrequestsignalstoexternalregulatorstogateunusedpowerdomainsofSPUsbeforeitassignstheROIgrid-tiletaskstotheSPUs.Consideringafewhundreds
ofssettlingtimeofexternalregulators,therequestsforpowergatingoccuronlyonceperframe.BytheWAPG,thenumberofactivatedpowerdomainsadaptivelyvariesaccordingtotheworkloadofinputframeasshowninFig.13.
Forfurtherreductionofdynamicpowerinactivatedpowerdomains,softwarecontrolledclockgatingisappliedtoeachop-eratingSPUasshowninFig.14.TheclockofSPUcanbegatedbytwosoftwarerequests,endrequestandwaitrequest.EachrequestismadebywritingoperationoftheSPUtopre-de nedaddress.TheendrequestoccurswhentheSPUhas nisheditsassignedtask.Ontheotherhand,thewaitrequestisgeneratedinsituationthattheSPUshouldstopitsoperationandwaitforothermodule’soperation.Tothisend,theSPUwritestheindexvalueatthepre-de nedwaitaddresstonotifytheindexofwait