2015-FSE-Suggesting accurate method and class names(2)

2020-12-24 16:46

1privatevoidcreateDefaultShader(){2StringvertexShader="literal_1";3StringfragmentShader="literal_2";

4shader=newShaderProgram(vertexShader,5fragmentShader);

6if(shader.isCompiled()==false)7thrownewIllegalArgumentException(8"literal_3"+shader.getLog());9}

Figureprogrammer1:Amethodingnamedit;fromautomaticallylibgdx’sCameraGroupStrategynamingitrequiresinvent-.Amodelaneologism,suggestsunderstandsaverycreateShadersthathardits.

nameinferenceshouldproblem.startwithOurcreatesubtokenandcanbetterexploitthestructureofcode,takingintoaccountlongrangedependenciesandmodelingthecontextsurroundingtheirde nitionsmorepreciselythanatthetoken-level,whileminimizingtheeffectsofdatasparsity.

Thispapertacklesthemethodnamingproblemwithanovel,neurallogbilinearcontextmodelforcode,inspiredbyneuralprob-abilisticlanguagemodelsfornaturallanguage,whichhaveseenmanyrecentsuccesses[37,28,35,31].Aparticularlyimpressivesuccessofthesemodelshasbeenthattheyassignwordstocon-tinuousvectorsthatsupportanalogicalreasoning.Forexample,vector(’king’)-vector(’man’)+vector(’woman’)resultsinavectorclosetovector(’queen’)[35,36].Althoughmanyofthebasicideashavealonghistory[10],thisclassofmodelisreceivingincreas-ingrecentinterestbecauseofincreasedcomputationalpowerfromGPUsandbecauseofmoreef cientlearningalgorithmssuchasnoisecontrastiveestimation[21,39].

Intuitively,ourmodelassignstoeveryidenti ernameusedinaprojectacontinuousvectorinahighdimensionalspace,insuchawaythatidenti erswithsimilarvectors,or“embeddings”,tendtoappearinsimilarcontexts.Then,tonameamethod(oraclass),weselectthenamethatismostsimilarinthisembeddingspacetothoseinthefunctionbody.Inthisway,ourmodelrealizesFirth’sfamousdictum,“Youshallknowawordbythecompanyitkeeps”.Thissloganencapsulatesthedistributionalhypothesis,thatsemanticallysimilarwordstendtoco-occurwiththesameotherwords.Twowordsaredistributionallysimilariftheyhavesimilardistributionsoversurroundingwords.Forexample,evenifthewords“hot”and“cold”neverappearinthesamesentence,theywillbedistribution-allysimilariftheybothoftenco-occurwithwordslike“weather”and“tea”.Thedistributionalhypothesisisacornerstoneofmuchworkincomputationallinguistics,butweareunawareofpreviousworkthatexploreswhetherthishypothesisholdsinsourcecode.Earlierworkonthenaturalnessofcode[25]foundthatcodetendstorepeatconstructsandexploitedthisrepetitionforprediction,butdidnotconsiderthesemanticsoftokens.Incontrast,thedistribu-tionalhypothesisstatesthatyoushallrecognizesemanticallysimilartokensbecausetheytendalsotobedistributionallysimilar.

Indeed,wequalitativelyshowinSection4thatourcontextmodelproducesembeddingsthatdemonstrateimplicitsemanticknowledgeaboutthesimilarityofidenti ers.Forinstance,itsuccessfullydis-tinguishesgettersandsetters,assignsfunctionnameswithsimilarfunctionality(likegrowandresize)tosimilarlocations,anddis-coversmatchingcomponentsofnames,whichwecallsubtokens,likeminandmax,andheightandwidth.

Furthermore,toallowustosuggestneologisms,weintroduceanewsubtokencontextmodelthatexploitstheinternalstructureofidenti ernames.Inthismodel,wepredictnamesbybreakingthemintoparts,whichwecallsubtokens,suchasget,create,andHeight,andthenpredictingnamesonesubtokenatatime.Thesubtokenmodelautomaticallyinfersconventionsabouttheinternalstructureofvariablenames,suchas“aninterfacestartswithanI”,or“anabstractclassstartswithAbstract”.Oursubtokenmodelalsolearnsconventionslikepre xingnamesofbooleanmethods

withisorhas.Thismodelalsoallowsustoproposeneologisms,byproposingsequencesofsubtokensthathavenotbeenseenbefore.ConsiderFigure1;oursubtokenmodelbuildsandexploresanembeddingspacethatallowsittosuggestcreateShaders,whichisusefullyclosetothenameaprogrammeractuallychose.Ourcontributionsfollow:

Weintroducealog-bilinearneuralnetworktomodelcodecontextsthat,unlikestandardlanguagemodelsinNLP,inte-gratesinformationfrompreceding,succeeding,andnon-localtokens.

Wearethe rsttoapplyaneuralcontextmodeltothemethodnamingproblem;and

Wedemonstratethatourmodelscanaccuratelysuggestnames:forthesimplervariablenamingproblem,theyimproveonthestateoftheart,andforclassandmethodnaming,ourbestmodelachievesF1scoresof60%onmethodnamesand55%onclassnames,whenrequiredtopredictnamesfor20%ofmethodandclassdeclarations.Additionally,oursubtokenmodel,thatcansuggestpreviouslyunseennames,achievesanF1of50%whenrequiredtosuggestnamesfor50%oftheclasses.ExampleSuggestionsToillustrateourmodel’scapabilities,wepresentafewexamplesofnamessuggestedbythemodel(forquan-titativeresults,seeSection5).Whenevaluatedonlibgdx,agraphicslibraryforAndroid,andaskedtosuggestanameforthevariablethatprogrammershadnamedisLooping,althoughitscon dencewaslow,ourmodelhaslearnedthatthenameshouldstartwithis.FormultipartmethodnameslikegetPersistentManifoldPool,itunderstoodgetwasalikelypre x,suggestingitwith38%con -denceandthatManifoldwasimportant,assigningitsinclusionaprobabilityof28%,andevenincludedgetManifoldPoolamongitstop vesuggestions.Onshorteragglutinations,likesetPad,itperformedbetter:all vetop-rankedsuggestionsstartedwithset,fourofitssuggestionsincludedtherootPad,anditrankedsetPad,theactualname,third.Itshandlingofclassnameswassimilar.ItlearnedthatthenameofanexceptionclassshouldendwithExceptionandinferredthatthenamesofActionandTestsub-classesshouldendinActionandTest.AparticularlyinterestingsuggestionourmodelmadethatcaughtoureyewasAndroidAudiofortheclassAndroidMusic.

UseCasesOursuggestionmodelcanbeembeddedwithinavarietyoftoolstosupportcodedevelopmentandcodereview.Duringdevelopment,supposethatthedeveloperisaddingamethodoraclasstoanexistingproject.Afterwritingthebody,thedevelopermaybeunsureifthenameshechoseisdescriptiveandconventionalwithintheproject.Ourmodelsuggestsalternativenamesfrompatternsitlearnedfromothermethodsintheproject.Duringcodereview,ourmodelcanhighlightthosenamestowhichourmodelassignsalowscore.Ineithercase,thesystemhastwophases:atrainingphase,whichtakesasinputatrainingsetofsource les(e.g.thecurrentrevisionoftheproject)andreturnsaneuralnetworkmodelthatcansuggestnames;andatestingordeploymentphase,inwhichtheinputisatrainedneuralnetworkandthesourcecodeofamethodorclass,andtheoutputisarankedlistofsuggestednames.Anysuggestionsystemhasthepotentialtosufferfromwhatwehavecalledthe“Clippyeffect”[2],inwhichtoomanylowqualitysuggestionsalienatetheuser.Topreventthis,oursuggestionmodelalsoreturnsanumericscorethatre ectsitsdegreeofcon denceinitssuggestion;practicaltoolswouldonlymakeasuggestiontotheuserifthecon denceweresuf cientlyhigh.

2.NEURALCONTEXTMODELSOFCODE

Inthissection,weintroducefourlanguagemodelsofcode,start-ingwiththen-grammodeltobuildintuition.Thenweintroduce

共12页:

2015-FSE-Suggesting accurate method and class names(2).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档