论文题名(中文): | 一个二阶段覆盖聚类算法及其应用 |
作者: | |
学号: | 200301263 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 070103 |
学科名称: | 概率论与数理统计 |
学生类型: | 硕士 |
学位: | 硕士 |
学校: | 延边大学 |
院系: | |
专业: | |
第一导师姓名: | |
第一导师学校: | |
第二导师姓名: | |
论文完成日期: | 2006-05-03 |
论文答辩日期: | 2006-06-03 |
论文题名(外文): | A two-step covering cluster algorithm and application |
关键词(中文): | |
关键词(外文): | principal component analysis k-meas alorithm coreing cluster algorithm cluster analysis |
论文文摘(中文): |
聚类分析是研究在没有训练集的情况下对样品进行分类的多元统计和数据分析方法。利用聚类分析方法对给定数据进行分类时,所采用的样品并不知道其所属类型,而根据样品间的“相似”程度来自动地进行分类。聚类分析的主要目的在于把给定数据集按照一定的规则适当地划分成一系列有意义的子集(或称类(clusters)),使得每个类中的样品之间相似程度尽量大,而处在不同类的样品之间尽可能有“较大差异”。一个好的聚类结果,一方面可对给定的数据按其固有的性质所分成的各个类去把握其特征,从而达到浓缩原来数据规模的目的。另一方面可从结构相对复杂的原始数据得到结构更加简单而直观的数据资料,以利于对给定问题做进一步分析和研究。 本文利用覆盖算法的思想,提出二阶段覆盖聚类算法,并在分析过程中对一些特征指标之间相关性大的聚类问题,应用主成分分析方法尽可能克服指标之间的高度相关程度对聚类结果稳定性的影响。力求做到既提高聚类算法的速度又保证聚类结果的有效性。并将通过一些实例分析部分地说明和检验所提出方法的可行性和有效性。
﹀
|
文摘(外文): |
Cluster analysis is a kind of multivariate statistical method of sample classification and data analysis without training sample set. When we utilize this method to classify the given data, though we don't know the type of adopted sample, it will classify the samples automatically according to their similarity. The main objective in clustering analysis is to discover natural roupings of the data, so that the data within each group are relatively similar (i.e., they possess largely the same characteristics) and the observations in different groups are relatively dissimilar. A good result of clustering, on the one hand, we can grasp the each group with given data's characteristics according to their inherent nature, so it comes up to the object of a few underlying and is useful for further discussion and research to the given problem. In this thesis, we propose two-step covering cluster algorithm by the idea of covering lgorithm. Meanwhile, in the process of analyzing some clustering problems, which includes some inter-relevant between character variable, We use Principal Component Analysis to overcome bad influence on stability of clustering some variables which are highly correlated. We try to promote the speed of clustering algorithm, and guarantee the valid of the clustering result, and through some analysis of examples to illustrate and examine the feasibility and validation of the putting forward method.
﹀
|
中图分类号: | O21 X 8 |
开放日期: | 2006-05-03 |