Data dependences, which relate statements that compute data values to statements that use those values, are useful for automating a variety of program-comprehension-related activities, such as reverse engineering, impact analysis, and debugging. Unfortunat
TechnicalReportGIT-CC-00-33,December2000
E ectsofPointersonDataDependences
AlessandroOrso,SaurabhSinha,andMaryJeanHarrold
CollegeofComputing
GeorgiaInstituteofTechnology
801AtlanticDriveAtlanta,GA30332
{orso,sinha,harrold}@cc.gatech.edu
Abstract
Datadependences,whichrelatestatementsthatcomputedatavaluestostatementsthatusethosevalues,areusefulforautomatingavarietyofprogram-comprehension-relatedactivities,suchasreverseengineering,impactanalysis,anddebugging.Unfortunately,datadependencesaredi culttocomputeandunderstandinthepresenceofcommonly-usedlanguagefeaturessuchaspointers,arrays,andstruc-tures.Tofacilitatethecomprehensionofdatadependencesinprogramsthatusesuchfeatures,wede neatechniqueforcomputingandclassifyingdatadependencesthattakesintoaccountthecomplexitiesintroducedbyspeci clanguageconstructs.Theclassi cationthatwepresentis ner-grainedthanprevi-ouslyproposedclassi cation.Moreover,unlikepreviouswork,wepresentempiricalresultsthatillustratethedistributionofdatadependencesforasetofCsubjects.Wealsopresentapotentialapplicationfortheproposedclassi cation:programslicing.Weproposeatechniquethatallowsforcomputingslicesbasedondata-dependencetypes.Thistechniquefacilitatestheuseofslicingforunderstandingapro-grambecauseausercaneitherincrementallyaugmentaslicebyincorporatingdatadependencesbasedontheirrelevance,orfocusonspeci ckindsofdependences.Finally,wepresentacasestudythatshowshowtheincrementalcomputationofslicescan(1)highlightsubtledatadependenceswithinaprogram,and(2)provideusefulinformationaboutthosedependences.
Keywords:Datadependences,pointeranalysis,programslicing,programcomprehension.
1Introduction
Understandingdatadependenceswithinprogramsisaprerequisitetoseveralprogram-comprehensionre-latedactivities,suchasmaintenance,reuse,reengineering,anddebugging.Inparticular,slicingtechniques,whichareoftenusedinthecontextofprogramunderstanding,highlydependontheavailabilityofreliableinformationaboutdependencesamongprogramvariables.Suchdependencescanbeidenti edbycomputingde nition-use(def-use)associations,whichrelatestatementsthatassignvaluestovariablestostatementsthatusethosevalues.Theproblemofcomputingdef-useassociationsintheabsenceofpointersisrelativelystraightforward.Insuchacase,de nitionsandusesofvariablescanbeidenti edbyusingonlysyntacticinformation.Oncede nitionsandusesareknown,def-useassociationscanbecomputedusingatraditionaldata- owanalysisalgorithm[2].
Unfortunately,traditionalapproachesforcomputingdef-useassociationsareinadequateinthepresenceofprogramminglanguageconstructssuchaspointers,arrays,andstructures.Thepossibilityofdirectlyaccessingmemorylocations,inlanguagessuchasC,complicatestheidenti cationofde nitionsandusesinthecode.Forexample,avariablemaybeaccessedatagivenstatementwithoutsyntacticallyappearinginit,iftheaccessoccursthroughapointerdereference.Therefore,syntacticinformationisnotsu cientinthepresenceofpointers,andthesetofmemorylocationsthatcanbeaccessedthroughadereference
Data dependences, which relate statements that compute data values to statements that use those values, are useful for automating a variety of program-comprehension-related activities, such as reverse engineering, impact analysis, and debugging. Unfortunat
mustbedeterminedpriortothecomputationofdef-useassociations.Moreover,becauseanassignmentorusethroughthedereferenceofapointercanpotentiallyassignavalueto,orusethevalueof,oneofseveralvariables,theseindirectassignmentsandusesmustbetreateddi erentlyfromdirect(i.e.,syntactic)assignments.
Mostofthepreviousresearchthathasuseddef-useassociationsmakesconservative(safe)approximationsthataretoosimplisticand,therefore,canbeveryimprecise.Inthe rstpartofthispaper,weextendpreviouslypresentedclassi cationschemestoallowforamore ne-grainedtaxonomyofdef-useassociations.Inourscheme,adef-useassociationisclassi edintooneof24categories.Thisclassi cationisbasedonthekindofde nitionanduse—eitherde niteorpossible—inthedef-useassociation,andonthetypesofthepathsoccurringbetweensuchde nitionanduse.Inthisway,eachdef-useassociationcorrespondstoaspeci ckindofdatadependence.Weextendthetraditionalreaching-de nition-basedalgorithmtocomputeandclassifydef-useassociationsaccordingtoourclassi cationscheme.Wealsopresentanddiscussempiricalresults,forasetofCsubjects,aboutthedistributionofdef-useassociationsintothevariouscategories.Inthesecondpartofthepaper,wepresentsomepossibleapplicationsoftheproposedclassi cation.Inparticular,weevaluatethee ectsofclassifyingdatadependencesonprogramslicing:weintroduceaslicingparadigminwhichslicesarecomputedbyfollowingonlyspeci edtypesofdatadependences.Basedonthisparadigm,wepresentanincrementalslicingtechnique.Thetechniquecanstarttheanalysisofaprogrambycomputingslicesthatconsideronly“strong”(i.e.,de nite)datadependences,andthenaugmenttheslicesincrementallybyincorporatingadditional,“weaker,”datadependencesinane cientway.Thisslicingapproachletstheuser rstfocusonasmaller,andthuseasiertounderstand,subsetoftheprogram,andthenconsiderincreasinglybiggerpartsofthecode.Thetechniquealsoprovidesawaytoisolatethedatadependencesthatarecausedbythepresenceofpointers.Inthisway,itispossibletohighlightsubtledatadependencesthatcana ectthebehavioroftheprograminpossiblyunforeseenways,andprovideusefulinformationaboutthosedependences.Finally,thetechniqueo ersawayofcontrollingthesizeofaslicebyeliminatingcertaindatadependencesfromtheslices.Wealsopresentacasestudythatweperformedtoinvestigatethepracticalusabilityofthepresentedtechnique.Themaincontributionsofthepaperare: