1privatevoidcreateDefaultShader(){2StringvertexShader="literal_1";3StringfragmentShader="literal_2";
4shader=newShaderProgram(vertexShader,5fragmentShader);
6if(shader.isCompiled()==false)7thrownewIllegalArgumentException(8"literal_3"+shader.getLog());9}
Figureprogrammer1:Amethodingnamedit;fromautomaticallylibgdx’sCameraGroupStrategynamingitrequiresinvent-.Amodelaneologism,suggestsunderstandsaverycreateShadersthathardits.
nameinferenceshouldproblem.startwithOurcreatesubtokenandcanbetterexploitthestructureofcode,takingintoaccountlongrangedependenciesandmodelingthecontextsurroundingtheirde nitionsmorepreciselythanatthetoken-level,whileminimizingtheeffectsofdatasparsity.
Thispapertacklesthemethodnamingproblemwithanovel,neurallogbilinearcontextmodelforcode,inspiredbyneuralprob-abilisticlanguagemodelsfornaturallanguage,whichhaveseenmanyrecentsuccesses[37,28,35,31].Aparticularlyimpressivesuccessofthesemodelshasbeenthattheyassignwordstocon-tinuousvectorsthatsupportanalogicalreasoning.Forexample,vector(’king’)-vector(’man’)+vector(’woman’)resultsinavectorclosetovector(’queen’)[35,36].Althoughmanyofthebasicideashavealonghistory[10],thisclassofmodelisreceivingincreas-ingrecentinterestbecauseofincreasedcomputationalpowerfromGPUsandbecauseofmoreef cientlearningalgorithmssuchasnoisecontrastiveestimation[21,39].
Intuitively,ourmodelassignstoeveryidenti ernameusedinaprojectacontinuousvectorinahighdimensionalspace,insuchawaythatidenti erswithsimilarvectors,or“embeddings”,tendtoappearinsimilarcontexts.Then,tonameamethod(oraclass),weselectthenamethatismostsimilarinthisembeddingspacetothoseinthefunctionbody.Inthisway,ourmodelrealizesFirth’sfamousdictum,“Youshallknowawordbythecompanyitkeeps”.Thissloganencapsulatesthedistributionalhypothesis,thatsemanticallysimilarwordstendtoco-occurwiththesameotherwords.Twowordsaredistributionallysimilariftheyhavesimilardistributionsoversurroundingwords.Forexample,evenifthewords“hot”and“cold”neverappearinthesamesentence,theywillbedistribution-allysimilariftheybothoftenco-occurwithwordslike“weather”and“tea”.Thedistributionalhypothesisisacornerstoneofmuchworkincomputationallinguistics,butweareunawareofpreviousworkthatexploreswhetherthishypothesisholdsinsourcecode.Earlierworkonthenaturalnessofcode[25]foundthatcodetendstorepeatconstructsandexploitedthisrepetitionforprediction,butdidnotconsiderthesemanticsoftokens.Incontrast,thedistribu-tionalhypothesisstatesthatyoushallrecognizesemanticallysimilartokensbecausetheytendalsotobedistributionallysimilar.
Indeed,wequalitativelyshowinSection4thatourcontextmodelproducesembeddingsthatdemonstrateimplicitsemanticknowledgeaboutthesimilarityofidenti ers.Forinstance,itsuccessfullydis-tinguishesgettersandsetters,assignsfunctionnameswithsimilarfunctionality(likegrowandresize)tosimilarlocations,anddis-coversmatchingcomponentsofnames,whichwecallsubtokens,likeminandmax,andheightandwidth.
Furthermore,toallowustosuggestneologisms,weintroduceanewsubtokencontextmodelthatexploitstheinternalstructureofidenti ernames.Inthismodel,wepredictnamesbybreakingthemintoparts,whichwecallsubtokens,suchasget,create,andHeight,andthenpredictingnamesonesubtokenatatime.Thesubtokenmodelautomaticallyinfersconventionsabouttheinternalstructureofvariablenames,suchas“aninterfacestartswithanI”,or“anabstractclassstartswithAbstract”.Oursubtokenmodelalsolearnsconventionslikepre xingnamesofbooleanmethods
withisorhas.Thismodelalsoallowsustoproposeneologisms,byproposingsequencesofsubtokensthathavenotbeenseenbefore.ConsiderFigure1;oursubtokenmodelbuildsandexploresanembeddingspacethatallowsittosuggestcreateShaders,whichisusefullyclosetothenameaprogrammeractuallychose.Ourcontributionsfollow:
Weintroducealog-bilinearneuralnetworktomodelcodecontextsthat,unlikestandardlanguagemodelsinNLP,inte-gratesinformationfrompreceding,succeeding,andnon-localtokens.
Wearethe rsttoapplyaneuralcontextmodeltothemethodnamingproblem;and
Wedemonstratethatourmodelscanaccuratelysuggestnames:forthesimplervariablenamingproblem,theyimproveonthestateoftheart,andforclassandmethodnaming,ourbestmodelachievesF1scoresof60%onmethodnamesand55%onclassnames,whenrequiredtopredictnamesfor20%ofmethodandclassdeclarations.Additionally,oursubtokenmodel,thatcansuggestpreviouslyunseennames,achievesanF1of50%whenrequiredtosuggestnamesfor50%oftheclasses.ExampleSuggestionsToillustrateourmodel’scapabilities,wepresentafewexamplesofnamessuggestedbythemodel(forquan-titativeresults,seeSection5).Whenevaluatedonlibgdx,agraphicslibraryforAndroid,andaskedtosuggestanameforthevariablethatprogrammershadnamedisLooping,althoughitscon dencewaslow,ourmodelhaslearnedthatthenameshouldstartwithis.FormultipartmethodnameslikegetPersistentManifoldPool,itunderstoodgetwasalikelypre x,suggestingitwith38%con -denceandthatManifoldwasimportant,assigningitsinclusionaprobabilityof28%,andevenincludedgetManifoldPoolamongitstop vesuggestions.Onshorteragglutinations,likesetPad,itperformedbetter:all vetop-rankedsuggestionsstartedwithset,fourofitssuggestionsincludedtherootPad,anditrankedsetPad,theactualname,third.Itshandlingofclassnameswassimilar.ItlearnedthatthenameofanexceptionclassshouldendwithExceptionandinferredthatthenamesofActionandTestsub-classesshouldendinActionandTest.AparticularlyinterestingsuggestionourmodelmadethatcaughtoureyewasAndroidAudiofortheclassAndroidMusic.
UseCasesOursuggestionmodelcanbeembeddedwithinavarietyoftoolstosupportcodedevelopmentandcodereview.Duringdevelopment,supposethatthedeveloperisaddingamethodoraclasstoanexistingproject.Afterwritingthebody,thedevelopermaybeunsureifthenameshechoseisdescriptiveandconventionalwithintheproject.Ourmodelsuggestsalternativenamesfrompatternsitlearnedfromothermethodsintheproject.Duringcodereview,ourmodelcanhighlightthosenamestowhichourmodelassignsalowscore.Ineithercase,thesystemhastwophases:atrainingphase,whichtakesasinputatrainingsetofsource les(e.g.thecurrentrevisionoftheproject)andreturnsaneuralnetworkmodelthatcansuggestnames;andatestingordeploymentphase,inwhichtheinputisatrainedneuralnetworkandthesourcecodeofamethodorclass,andtheoutputisarankedlistofsuggestednames.Anysuggestionsystemhasthepotentialtosufferfromwhatwehavecalledthe“Clippyeffect”[2],inwhichtoomanylowqualitysuggestionsalienatetheuser.Topreventthis,oursuggestionmodelalsoreturnsanumericscorethatre ectsitsdegreeofcon denceinitssuggestion;practicaltoolswouldonlymakeasuggestiontotheuserifthecon denceweresuf cientlyhigh.
2.NEURALCONTEXTMODELSOFCODE
Inthissection,weintroducefourlanguagemodelsofcode,start-ingwiththen-grammodeltobuildintuition.Thenweintroduce