(a)Methoddeclarationsexcludingoverrides.
(b)Class(Type)Declarations.
Figuretionspredict,at7:bestsok=Evaluationwe1.ofsinglepointsuggestionsfordeclara-excludeOverridenthem.methodThe“features”declarationsmodelareeasiertohigherF1trast,suggestionthesuggestionscoresformethodachievesthe“subtoken”frequenciesdeclarations,butlackscon denceatmodelachieves(wheretheagoodlineF1stops).scoreInforcon-toaccuratelyfrequenciessuggestclassformethodnames.
namesandistheonlymodelallmethodnames[18].Manystudiesofnaminghavealsobeencon-ductedgivingusinsightintoitsimportance.Butleretal.foundthat“ awed”identi ernames(thosethatviolatenamingconventionsordonotfollowcodingpracticeguidelines)arerelatedtocertaintypesofdefects[14].Latertheyalsoexaminedthemostfrequentgrammat-icalstructuresofmethodnamesusingpartofspeechtagging[15].Lawrieetal.[29]andTakangetal.[50]bothconductedempiricalstudiesandconcludedthatthequalityofidenti ernamesincodehaveaprofoundeffectonprogramcomprehension.Liblitetal.ex-ploredhownamesincode“combinetogethertoformlargerphrasesthatconveyadditionalmeaningaboutthecode.”[30].Arnaoudovaetal.[6]studiedidenti errenamings,showingthatnamingisanimportantpartofsoftwareconstruction.Additionally,inasurveyof94developers,theyfoundthatabout68%ofdevelopersthinkthatrecommendingidenti erswouldbeuseful.Thesestudieshighlighttheimportanceofourwork,bybeingabletosuggestqualitynamesorpartsofnames.
Asmethodandclassnamesareexpectedtoindicatetheirseman-tics,theycanbeviewedasaspecialcaseofcodesummarization.Haiducetal.showedthatNLtextsummarizationdoesnotworkwellforcode[23]andsuchtechniquesmustbeadaptedtobeef-fective.Theylaterdevelopedsummariesthatareusedtoimprovecomprehension[22]http://www.77cn.com.cnedidiomsandstructureinthecodeofmethodstogeneratehighlevelabstractsummaries.Whiletheydon’tsuggestmethodnames,theydiscusshowtheirapproachmaybeextendedtoprovidethem[47].Sridharaalsoshowedhowtogeneratecodesummariesappropriateforcommentswithinthecode(e.g.asmethodheaders)[46,45].Formoreworkinthisarea,Eddyetal.provideasurveyofcodesummarizationmethods[20].Wenotethatmoststudiesandapproachesinthisareafocusonnamesofvariables, elds,andmethods.Althoughsomeexamineallidenti- ersinthecode,weareunawareofanyworkthatfocusesontype(class)namesaswedo.
LanguageModelsInSoftwareEngineeringProbabilisticmodelsofsourcecodehavebeenappliedinsoftwareengineering.Hindleetal.andNgyuenetal.[25,41]usedn-grammodelstoimprovecodeautocompletion.AllamanisandSutton[3]presentanappli-cationofcoden-grammodelsatscale.MaddisonandTarlow[31]builtamoresophisticatedgenerativemodelofsourcecodeusinglog-bilinearmodelsthatre ectsthesyntacticstructureofthecode.Althoughthemachinelearningprinciplesweusearesimilar,theirmodeldifferssigni cantlyfromours,becausetheirpurposeistobuildmodelsthatgeneratesourcecoderatherthanimproveexist-ingcode.Inotherwords,ourmodelisdiscriminativeratherthangenerative.Mouetal.[40]useaconvolutionalneuralnetworktoclassifycodefromprogrammingcompetitionproblems.Karaivanovetal.[27]combineLMswithstaticprogramanalysistosuggestmethodcallsand ll-ingaps.Otherapplicationsofprobabilisticsourcecodemodelsareextractingcodeidioms[4]andcodemi-gration[27].Closelyrelatedtothisworkisourpreviousworkwhereweinferformattingandnamingconventions[2]usingn-gramLMstosuggestnaturalrenamings.Raychevetal.[43]presentadiscriminativeprobabilisticmodeltopredicttypesandnamesofvariablesinJavaScript.Incontrast,ourcurrentworkintroducesalog-bilinearmodelthatgreatlyimprovesonthen-gramLM,espe-ciallyonmethodandclassnaming,proposingneologismsbytakingintoaccountsubtokensandnon-localcontext.
OtherApplicationsofNeuralLogbilinearModelsNeurallog-bilinearmodelshavebeenusedinNLPforLMs[37,39]andde-scribingimageswithNL[28].Log-bilinearmodelshavebeenshowninNLPtoproducesemanticallyconsistentandinterestingvectorspacerepresentations(embeddings).Notablesystemsincludeword2vec[35,36]andGloVe[42].Incontrasttotheseapproaches,weusearichnotionofnon-localcontextbyincorporatingfeaturesspeci ctosourcecodewhileweproducesimilarvectorspacemod-elsformethodnames,variablesandtypes.Additionally,wepresentanovelsub-tokenmodel.RelatedtooursubtokenmodelistheworkofBothaandBlunsom[12]thatintegratecompositionalmor-phologicalrepresentationsofwordsintoalog-bilinearLMbutthemorphologicalfeaturesareonlyusedinthecontextofanLM.
7.CONCLUSION
Weintroducedthemethodnamingproblem,thatofautomaticallydeterminingafunctionallydescriptivenameofamethodorclass.Previousworkonautomaticallyassigningnames[2,43]focusesonlocalvariables,andreliesonrelativelylocalcontext.Namingmethodsismoredif cultbecauseitrequiresintegratingnon-localinformationfromthebodyofthemethodorclass.Wepresenteda rstsolutionusingalog-bilinearneurallanguagemodel,whichincludesfeaturefunctionsthatcapturelong-distancecontext,andasubtokenmodelthatcanpredictneologisms,namesthatdidnotappearinthetrainingset.Themodelembedseachtokenintoahighdimensionalcontinuousspace.
Continuousembeddingsofidenti ershavemanyotherpotentialapplicationsinsoftwareengineering,suchasrejectingcommitswhosenamesviolateprojectconventions;explorationoflinguisticanti-patterns,suchasagetterstartingwithset[5]andfeaturelocalization.Finally,aproblemsimilartomethodnamingarisesinNLP,namelytheproblemofgeneratingaheadlinefromthetextofanarticle[8,19].Itispossiblethatmodelssimilartoourscouldshedlightonthatproblemaswell.
8.ACKNOWLEDGEMENTS
ThisworkwassupportedbyMicrosoftResearchthroughitsPhDScholarshipProgramme.CharlesSuttonwassupportedbytheEn-gineeringandPhysicalSciencesResearchCouncil[grantnumberEP/K024043/1].