A 201.4 GOPS real-time multi-object recognitionprocessor is presented with a three-stage pipelined architecture.Visual perception based multi-object recognition algorithm isapplied to give multiple attentions to multiple objects in the inputimage. For human-like multi-object perception, a neural perceptionengine is proposed with biologically inspired neural networksand fuzzy logic circ
42IEEEJOURNALOFSOLID-STATECIRCUITS,VOL.45,NO.1,JANUARY
2010
Fig.13.Workload-awaredynamicpower
management.
Fig.15.Chipmicrograph.
Fig.14.Softwarecontrolledclockgating.
conditionstoberesolved.Inthiscase,theclockisautomati-callyrestoredwhenallthewaitconditionsareresolved.WiththeWAPGandsoftwarecontrolledclockgating,thepowercon-sumptionofthe16SPUsisreducedby38%,from542mWto336mW,whilethepowerconsumptionoftheoverallprocessoramountsto496mWat60frame/secframe-rate.
VII.CHIPIMPLEMENTATIONANDEVALUATION
Theproposedrecognitionprocessorisfabricatedina
0.13m
mmchipcon-1-poly8-metalCMOStechnologyand
its
tains36.4Mtransistorsincluding3.7Mlogicgatesand396KBon-chipSRAM.Fig.15showsthechipmicrographandTableIsummarizesitsfeatures.Theoperatingfrequencyis200MHzforIPblocksand400MHzfortheNoC.Itspeakperformanceamountsto201.4gigaoperationspersecond(GOPS)when695mWisdissipated.Speci cally,128PEsof16SPUs,eachofwhichperformsupto veoperationspercyclewithatwo-wayMACinstruction,performs128GOPS.TheNPEperforms
54
GOPS;40linearPEsoftheVAEperform24GOPS,fourparallelanalog-digitalmixeddatapathsoftheODEperform20GOPS,parallelSADunitsoftheMEperform9.8GOPS,andacon-trolRISCperforms0.2GOPS.TheDPperforms19.4GOPSusingits3216-bitSADdistancecalculationandcompareunits.Theaveragepowerconsumptionoftheprocessoris496mWatthesupplyvoltageof1.2Vwhiletheproposedmulti-objectrecognitionisrunningat60frame/secframe-rate.TableIIshowspowerbreak-downoftheproposedprocessor.The16SPUsac-countforabouttwothirdsofoverallpowerconsumption.
Fig.16showsperformancecomparisonsoftheproposedpro-cessorwithpreviousvisionprocessors[2]–[4],[20].Fig.16(a)showspoweref ciencycomparison.TheGOPS/W,whichnormalizestheGOPSperformancewiththepower,isadoptedasaperformanceindexwherethe1operationmeans16-bit xed-pointoperation.Theproposedprocessorachieves290GOPS/W,whichis1.36timeshigherthanthepreviousvisionprocessors.Fig.16(b)showsenergyef ciencycomparisoninobjectrecognition,whichisobtainedbyenergyconsumptionpereachframe.With60frame/secoperationbythepipelinedarchitectureandunder0.5Wpowerconsumptionbytheworkload-awaredynamicpowermanagement,theproposed