2015-FSE-Suggesting accurate method and class names(10)

2020-12-24 16:46

(a)Methoddeclarationsexcludingoverrides.

(b)Class(Type)Declarations.

Figuretionspredict,at7:bestsok=Evaluationwe1.ofsinglepointsuggestionsfordeclara-excludeOverridenthem.methodThe“features”declarationsmodelareeasiertohigherF1trast,suggestionthesuggestionscoresformethodachievesthe“subtoken”frequenciesdeclarations,butlackscon denceatmodelachieves(wheretheagoodlineF1stops).scoreInforcon-toaccuratelyfrequenciessuggestclassformethodnames.

namesandistheonlymodelallmethodnames[18].Manystudiesofnaminghavealsobeencon-ductedgivingusinsightintoitsimportance.Butleretal.foundthat“ awed”identi ernames(thosethatviolatenamingconventionsordonotfollowcodingpracticeguidelines)arerelatedtocertaintypesofdefects[14].Latertheyalsoexaminedthemostfrequentgrammat-icalstructuresofmethodnamesusingpartofspeechtagging[15].Lawrieetal.[29]andTakangetal.[50]bothconductedempiricalstudiesandconcludedthatthequalityofidenti ernamesincodehaveaprofoundeffectonprogramcomprehension.Liblitetal.ex-ploredhownamesincode“combinetogethertoformlargerphrasesthatconveyadditionalmeaningaboutthecode.”[30].Arnaoudovaetal.[6]studiedidenti errenamings,showingthatnamingisanimportantpartofsoftwareconstruction.Additionally,inasurveyof94developers,theyfoundthatabout68%ofdevelopersthinkthatrecommendingidenti erswouldbeuseful.Thesestudieshighlighttheimportanceofourwork,bybeingabletosuggestqualitynamesorpartsofnames.

Asmethodandclassnamesareexpectedtoindicatetheirseman-tics,theycanbeviewedasaspecialcaseofcodesummarization.Haiducetal.showedthatNLtextsummarizationdoesnotworkwellforcode[23]andsuchtechniquesmustbeadaptedtobeef-fective.Theylaterdevelopedsummariesthatareusedtoimprovecomprehension[22]http://www.77cn.com.cnedidiomsandstructureinthecodeofmethodstogeneratehighlevelabstractsummaries.Whiletheydon’tsuggestmethodnames,theydiscusshowtheirapproachmaybeextendedtoprovidethem[47].Sridharaalsoshowedhowtogeneratecodesummariesappropriateforcommentswithinthecode(e.g.asmethodheaders)[46,45].Formoreworkinthisarea,Eddyetal.provideasurveyofcodesummarizationmethods[20].Wenotethatmoststudiesandapproachesinthisareafocusonnamesofvariables, elds,andmethods.Althoughsomeexamineallidenti- ersinthecode,weareunawareofanyworkthatfocusesontype(class)namesaswedo.

LanguageModelsInSoftwareEngineeringProbabilisticmodelsofsourcecodehavebeenappliedinsoftwareengineering.Hindleetal.andNgyuenetal.[25,41]usedn-grammodelstoimprovecodeautocompletion.AllamanisandSutton[3]presentanappli-cationofcoden-grammodelsatscale.MaddisonandTarlow[31]builtamoresophisticatedgenerativemodelofsourcecodeusinglog-bilinearmodelsthatre ectsthesyntacticstructureofthecode.Althoughthemachinelearningprinciplesweusearesimilar,theirmodeldifferssigni cantlyfromours,becausetheirpurposeistobuildmodelsthatgeneratesourcecoderatherthanimproveexist-ingcode.Inotherwords,ourmodelisdiscriminativeratherthangenerative.Mouetal.[40]useaconvolutionalneuralnetworktoclassifycodefromprogrammingcompetitionproblems.Karaivanovetal.[27]combineLMswithstaticprogramanalysistosuggestmethodcallsand ll-ingaps.Otherapplicationsofprobabilisticsourcecodemodelsareextractingcodeidioms[4]andcodemi-gration[27].Closelyrelatedtothisworkisourpreviousworkwhereweinferformattingandnamingconventions[2]usingn-gramLMstosuggestnaturalrenamings.Raychevetal.[43]presentadiscriminativeprobabilisticmodeltopredicttypesandnamesofvariablesinJavaScript.Incontrast,ourcurrentworkintroducesalog-bilinearmodelthatgreatlyimprovesonthen-gramLM,espe-ciallyonmethodandclassnaming,proposingneologismsbytakingintoaccountsubtokensandnon-localcontext.

OtherApplicationsofNeuralLogbilinearModelsNeurallog-bilinearmodelshavebeenusedinNLPforLMs[37,39]andde-scribingimageswithNL[28].Log-bilinearmodelshavebeenshowninNLPtoproducesemanticallyconsistentandinterestingvectorspacerepresentations(embeddings).Notablesystemsincludeword2vec[35,36]andGloVe[42].Incontrasttotheseapproaches,weusearichnotionofnon-localcontextbyincorporatingfeaturesspeci ctosourcecodewhileweproducesimilarvectorspacemod-elsformethodnames,variablesandtypes.Additionally,wepresentanovelsub-tokenmodel.RelatedtooursubtokenmodelistheworkofBothaandBlunsom[12]thatintegratecompositionalmor-phologicalrepresentationsofwordsintoalog-bilinearLMbutthemorphologicalfeaturesareonlyusedinthecontextofanLM.

7.CONCLUSION

Weintroducedthemethodnamingproblem,thatofautomaticallydeterminingafunctionallydescriptivenameofamethodorclass.Previousworkonautomaticallyassigningnames[2,43]focusesonlocalvariables,andreliesonrelativelylocalcontext.Namingmethodsismoredif cultbecauseitrequiresintegratingnon-localinformationfromthebodyofthemethodorclass.Wepresenteda rstsolutionusingalog-bilinearneurallanguagemodel,whichincludesfeaturefunctionsthatcapturelong-distancecontext,andasubtokenmodelthatcanpredictneologisms,namesthatdidnotappearinthetrainingset.Themodelembedseachtokenintoahighdimensionalcontinuousspace.

Continuousembeddingsofidenti ershavemanyotherpotentialapplicationsinsoftwareengineering,suchasrejectingcommitswhosenamesviolateprojectconventions;explorationoflinguisticanti-patterns,suchasagetterstartingwithset[5]andfeaturelocalization.Finally,aproblemsimilartomethodnamingarisesinNLP,namelytheproblemofgeneratingaheadlinefromthetextofanarticle[8,19].Itispossiblethatmodelssimilartoourscouldshedlightonthatproblemaswell.

8.ACKNOWLEDGEMENTS

ThisworkwassupportedbyMicrosoftResearchthroughitsPhDScholarshipProgramme.CharlesSuttonwassupportedbytheEn-gineeringandPhysicalSciencesResearchCouncil[grantnumberEP/K024043/1].

共12页:

2015-FSE-Suggesting accurate method and class names(10).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档