模式识别中聚类分析算法综述（论文）(6)

2019-01-19 13:10

毕业设计（论文）第 21 页

基于遗传算法的聚类

遗传算法（GAS）是一种简单，可靠，高效的优化算法，它以自然遗传学机制中优胜劣汰的自然选择和生存原则为指导。基于遗传算法的聚类算法被应用到聚类分析已经有了一段历史。在论文中遗传算法通过选择适当的聚类中心使得聚类分析结果的相似性度量得到优化。显然，它们基于形心，并且容易产生超球集群。这篇论文提出了基于图的基因顺序分析和聚类分析的方法。

基于遗传算法的算法以初始人口开始，代表k聚类问题的可行的解决方案Ujjwal Malulik使用字符串实数编码的k个聚类中心。因此，对于一个N维空间中，个体的染色体的长度为N×k千字。Hwei-JenLin用二进制表示，而不是用字符串表示k个聚类中心的编码，以加快身体素质评估。在论文中用另一种编码方案组数字来表示对象属于哪个集群。

身体素质是衡量一个好的解决方案是如何被用作再现个人选择的基础。每个人都分配了一个身体素质价值。身体素质价值越高，表明解决方案越好。大多数现有的基于遗传算法的K聚类算法使用每个对象的聚类中心的平方欧氏距离的总和作为他们的身体素质函数。

在每一代中，都是基于现在人口的身体素质价值来选择个人的。使用最广泛的选择机制是轮盘赌选择和比赛的选择。运用交叉和变异的方法从而形成一个新的人种。

这个过程要迭代数代。结束时，选择在人口中的最好的解决方案作为的集群解决方案。

毕业设计（论文）第 22 页

译文2 Analysis of Landsat 5TM Data of Malaysian Land Covers

Using ISODATA Clustering Technique

Studies on classification of remote sensing data have long been carried out by numerous researchers worldwide, with more efforts made regionally than globally. Many regional studies have been carried out in places such as Europe and America due to having an up-to-date remote sensing facilities as well as ground truth information. There is also an increasing interest to carry out such studies in climate-affected regions such as Africa and highly populated regions such as India and China . Nonetheless, not much effort has been expended in Tropical countries such as Malaysia, despite their recent promising developments in remote sensing capabilities. Two types of methods that are commonly used are supervised and unsupervised classification. Supervised classification classifies pixels based on known properties of each cover type; therefore it requires representative of land cover information, in terms of training pixels. On the other hand, in unsupervised classification, the clustering process produces clusters that are statistically separable, giving a natural grouping of the pixels. This approach is useful when reliable training data are either limited or expensive, and when there is insufficient a priori information about the data. Two types of commonly used unsupervised classification are K-means and ISODATA. K-means is a simple clustering procedure that attempts to find the cluster centres in the data, then aims to cluster the full set of pixels into K clusters. The main disadvantage is that K-means requires the number of clusters is known a priori. The main advantage of ISODATA over K-means algorithm is that ISODATA allows different numbers of clusters (ranging from a minimum to a maximum number of clusters) to be specified; therefore is more adaptable and flexible than K-means. This study presents a detailed analysis of ISODATA clustering for Malaysian land covers using Thematic Mapper (TM), a medium resolution multispectral sensor on board Landsat 5 satellite. This makes use qualitative and quantitative approaches. Hopefully, this analysis, although limited to a single scene, will provide some insight in application of ISODATA on multispectral image classification.

毕业设计（论文）第 23 页

用ISODATA聚类分析方法

分析第五号陆地卫星覆盖的马来西亚土地的数据

世界各地众多的研究人员早已开展遥感数据分类研究，但这些研究更多的是区域性的研究并没有成为全球性的研究。许多区域的研究已经在欧洲和美国等地方进行，由于欧美地区拥有先进的遥感设施以及地面真实信息。在受气候变化影响的非洲地区和人口稠密的中国和印度等地区也越来越有兴趣开展这样的研究。然而，除了最近在遥感研究方面的能力的发展，像马来西亚这样的热带国家并没有在这类研究上花费太多的精力。有两种常用的分类方法是监督分类和无监督分类。监督分类基于每个覆盖类型的已知属性对像素进行分类，因此在像素培训中需要土地覆盖信息的代表。另一方面，非监督分类的聚类过程中产生的集群是统计学上是可分离的，给人一种自然分组的像素。在可靠的训练数据是有限的或昂贵的，并且有足够的与先验信息有关的数据的情况下，这种分类方法是有用的。两种常用的非监督分类方法是K-均值算法和ISODATA算法。K-均值算法是一种简单的聚类方法，它试图发现中心数据集群，旨在将全套像素分为为K种聚类，这种算法的主要的缺点是它要求簇的数目是事先已知的。而ISODATA算法与K-均值算法相比，主要优点是ISODATA算法允许不同数量（范围从最小值到最大值的群集数）的簇被指定，也就是说ISODATA算法比K-均值算法的适应性和灵活性更强。这项研究对使用专题制图仪TM（5号陆地卫星上的一个中分辨率多光谱传感器）对马来西亚的陆地信息进行聚类提出了详细的分析。这使用了定性和定量方法。我们希望虽然这种分析仅限于一个单一的场景，但它可以在ISODATA多光谱图像分类应用方面提供一些见解。

毕业设计（论文）第 24 页

附录 B

最短距离方法的MATLAB程序代码

function[m_pattern] = C_ZuiDuanJuLi(m_pattern,patternNum) disType = DisSelDlg();

T = InputThreshDlg(m_pattern,patternNum,disType); for i = 1:patternNum m_pattern(i).category = i; end while(true) minDis = inf; pi = 0; pj = 0;

for i = 1:patternNum-1 for j = i+1:patttern

if(m_pattern(i).category~ = m_pattern(j).category)

tempDis = GetDistance(m_pattern(i),m_pattern(j),disType); if(tempDis < minDis) minDis = tempDis; pi = m_pattern(i).category; pj = m_pattern(j).category; end end end end

if(minDis <= T) if(pi > pj) temp = pi; pi = pj; pj = temp; end

for i = 1:patternNum

毕业设计（论文）第 25 页

if(m_pattern(i).category == pj) m_pattern(i).category = pi; elseif(m_pattern(i).category > pj)

m_pattern(i).category = m_pattern(i).category-1; end end else break; end end

最长距离方法的MATLAB程序代码

function[m_pattern] = C_ZuiChangJuLi(m_pattern,patternNum) disType = DisSelDlg();

T = InputThreshDlg(m_pattern,patternNum,disType); for i = 1:patternNum m_pattern(i).category = i; end

centerNum = patternNum; while(true) minDis = inf; pi = 0; pj = 0;

for i = 1:patternNum-1 for j = i+1:centerNum maxDis = -1; for m = 1:pattern-1

for n = m+1:patternNum

if((m_pattern(m).category == i) &&

(m_pattern(n).category==j) || (m_pattern(m).category == j) && (m_pattern(n).category == i))

tempDis = GetDistance( m_pattern(m),

m_pattern(n),

共6页:

模式识别中聚类分析算法综述（论文）(6).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档