• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

一种新的软聚类投票法及其并行化实现

A novel voting method and parallel implementation for soft clustering

  • 摘要: 聚类集成作为数据挖掘的重要应用工具,得到了广泛的认可和研究.本文在投票法的基础上提出一种新的软聚类投票 (VMSC)算法.算法首先求取平均隶属度矩阵,然后进行迭代优化.该算法能够消除噪声点影响,具有很好的稳定性.Spark云计算平台能够高效处理大数据.为了提出的算法处理大数据,在Spark云计算平台上实现并行的VMSC算法.VMSC算法实验用12组UCI数据集进行验证,并与sCSPA、sMCLA、sHGBF及SVCE等软聚类算法进行对比.结果表明,VMSC算法对软聚类算法具有较好的集成效果.在Spark云计算平台上对VMSC算法并行实现.实验表明,该算法具有较理想的并行效果,能够有效处理大数据.

     

    Abstract: As an important tool of Data Mining, clustering ensemble has been widely recognized and studied. This paper proposes a novel voting method for Soft Clustering(VMSC). The ensemble process consists of two steps: calculating the average degree of membership matrix as the input of the second step, and iterative optimization. This method deals well with eliminating the influences of noise and has good stability. The cloud computing platform of Spark handles big data efficiently. The VMSC algorithm was parallelizod to make it suitable for big data on Spark Cloud Computing platform. In the VMSC experiments, 12 UCI datasets were used to test it, and its results were compared with 4 other soft clustering ensemble algorithms: sCSPA, sMCLA, sHGBF and SVCE. The experiments indicate that the VMSC algorithm has a better integration effect. And the parallel experiments show that its parallel implementation manages big data efficiently.

     

/

返回文章
返回