本科毕业论文数据挖掘K均值算法实现(5)

2020-04-21 00:49

参考文献

[1]T Zhang．R．Ramakrishnan and M．ogihara．An efficient data clustering method

for

very

largedatabases．In

Pror．1996

ACM-SlGMOD

hat．Conf．Management of Data，Montreal.Canada，June 1996：103．114. [2]邵峰晶，于忠清，王金龙，孙仁城数据挖掘原理与算法（第二版）北京：科学出版社 ,2009, ISBN 978-7-03-025440-5.

[3]张建辉．K-meaIlS聚类算法研究及应用：[武汉理工大学硕士学位论文]．武汉：武汉理工大学，2007.

[4]冯超．K-means 类算法的研究：[大连理工大学硕士学位论文]．大连：大连理工大学，2007.

[5]曾志雄．一种有效的基于划分和层次的混合聚类算法．计算机应用，2007，27(7)：1692．1695.

[6]范光平．一种基于变长编码的遗传K-均值算法研究：[浙江大学硕士学位论文]．杭州：浙江大学，2007.

[7]孙士保，秦克云．改进的K-平均聚类算法研究．计算机工程，2007，33(13)：200．202.

[8]孙可，刘杰，王学颖．K均值聚类算法初始质心选择的改进．沈阳师范大学学报，2009，27(4)：448-450.

[9]Jain AK，Duin Robert PW，Mao JC．Statistical paaern recognition：A review．IEEE Trans．Actions on Paaem Analysis and Machine Intelligence，2000，22(1)：4-37.

[10]Sambasivam S，Theodosopoulos N．Advanced data clustering methods ofmining web documents．Issues in Informing Science and Information Technology，2006，8(3)：563．579.

[11]Z．Huang．Extensions to the K-means algorithm for clustering large data sets with categorical values．Data Mining and Knowledge discovery,1998，(2)：283-304.

[12]M．Ester,H，P．Kriege．A density-based algorithm for discovery clusters in large spatial databases．In Proc．1996 Int．03n￡Knowledge Discovery and Data Mining Portland． Aug 1996：226-2311

[13]毛国君，段丽娟，王实，等．数据挖掘原理与算法(第二版)．北京：清华大学出版社，2007.

[14] Wang,J．\and R．Muntz．A statistical information grid approach to spatial data mining．In Proc．1997 Int．Conf．Very Large Databases，Athens，Greece，Aug．1997：186-195.

[15]Wu K L，Yang M S．Alternative fuzzy c-means clustering algorithaL Pattern Recognition.2002，35：2267—2278．

[16]Hammerly G.Elkan C．Alternatives to the k-means algorithm that find better clusterings，in：Proc．of the 1 lth Int．Conf．on Information and Knowledge Management，2002：600—607．

[17]Alsabti K，Ranka S，Singh K．An Efficient K-Means Clustering AlgorithnL In：Proceedings of PPS／SPDP Workshop on High performance Data Mining.1997：34—39．

[18]Lozano J A，Pena J M，Larranaga P．An empirical comparison of four initialization methods for the k-means algorithm.Pattern Recognition，1999，20：1027—1040．

[19]Likas A，Vlassis N，Verbeek J J．The global k-means clustering algoritl卫L Pattern Recognition，2003，36：451—461．

[20]Kiri W，Claire C，Stefan S．Constrained K-means Clustering with Background Knowledge．Proceeding of the Eighteenth Internat ional Conference on Machine Learning.2001：577-5841．

[21]Ng A Y，Jordan M I，Weiss Y．On spectral clustering：Analysis and an algorittms InAdvances in Neural Information Processing Systems，2001，14：849—856．

[22]Higham D，Kibble K．A Unified view of spectral clustering．Technical Report 02。University of Strathclyde， Department of Mathematics，2004．

[23]Alpert C，Eahng A，Yao S．Spectral partitioning：The more eigenvectors，the better．Di screte Applied Math，1999，90：3—26．

Implement of K-means algorithm

Abstract:With the rapid development of Internet technology, now people every day will face such as text, images, video, audio and other data in the form, the size of these data the amount of data it is amazing. How quickly and efficiently from these a large number of data mining to extract the value of the implication of its particular concern and the need to immediately solve the problem. Data Mining (Data Mining, DM) It is for this guess is born out slowly. Data mining after a little time the rapid development of the birth of a large number of theoretical results and practical use of results, it provides a number of tools and effective way to solve the problem. A data mining is a very important area of research, that is, clustering analysis, which is a data in accordance with the type based on the data packet or data framing. Clustering in terms of biological research, or in the business trade, image analysis, web content classification and other areas of daily life have been a very good application.

Depending on the data type, the use of different functions, and clustering the different needs of the clustering algorithm is probably the following: Partition-based algorithm, level-based algorithm, based on the density of the algorithm, the model-based algorithm and grid-based algorithm. In this, the present study the most mature classic K-means clustering algorithm based on partition algorithm. The field of application of the K-means algorithm is particularly extensive coverage involves voice frequency compression as well as images and text clustering, the other to play its important use in data preprocessing and neural network structure of the task decomposition. The work done in this article:

The first part of this article: Details of the background and purpose of the thesis, and I selected topic ideas to consider, as well as in the current international form of cluster analysis in our international status and Summary of the research results at home and abroad, and finally the contents of the realization of this thesis and papers overall layout arrangements.

Part II: the first detailed description of the source of development of data mining as well as its definition of the concept, the following describes the cluster analysis, basic knowledge of basic concepts such as clustering principle, the internal characteristics of the clustering algorithm, described in detail Several current method of cluster analysis, summary comparison of the characteristics and weaknesses of

each method. Finally, the paper studied the clustering algorithm based on partition for further discussion which has several algorithms.

Part III: This is the focus of this paper, the K-means algorithm to be discussed in this paper, the basic idea of the algorithm process from the concept of K-means algorithm detailed introduction and a detailed analysis of its flaws. K-means algorithm select the initial worth more sensitive and a different order of data input will also affect the clustering problem, we solve the problem verified, maybe factors which affect the clustering result will be proved by experiments. The experiments show that the K-means algorithm is very sensitive to the initial value and the data input sequence, but maybe a different impact on the clustering results. In this paper, six experimental results analyzed, to change the initial point, has little effect on the clustering results, but will change the number of iterations, and select the initial continuous data for at least the number of iterations of the initial point, an interval of a few data appears as an initial point of the smallest number of iterations, but the data and there is too much uncertainty, so choose the best start that several data for data clustering initial point; for changing the input of the data set order, clustering results before a big change, say the name of the input sequence only affects the clustering results also affected the number of iterations. These conclusions for future users using the K-means algorithm provides a good help for this algorithm provided the reference.

Keywords: data mining ,cluster analysis,K-means algorithm ,experimental verification

共7页:

本科毕业论文数据挖掘K均值算法实现(5).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档

本科毕业论文 数据挖掘K均值算法实现(5)

本科毕业论文数据挖掘K均值算法实现(5)