ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Application of sparse spectral clustering algorithm in high-dimensional data

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2017.04.005
  • Received Date: 28 August 2016
  • Rev Recd Date: 08 December 2016
  • Publish Date: 30 April 2017
  • A new sparse spectral clustering algorithm——high-dimensional sparse spectral clustering based on partitioning around medoids (HSSPAM) was proposed, which takes advantage of the sparse similarity matrix in computation as well as the superiority of the PAM algorithm over K-means. To reduce or even eliminate the impact of “dimensionality curse” on high dimensional data processing, the high correlation filter (HCF) and the principal component analysis (PCA) method are also investigated in the algorithm. The proposed method has higher precision and more stable clustering results than the algorithms introduced in this paper for comparison in the real high-dimensional gene data under different clustering evaluation criteria.
    A new sparse spectral clustering algorithm——high-dimensional sparse spectral clustering based on partitioning around medoids (HSSPAM) was proposed, which takes advantage of the sparse similarity matrix in computation as well as the superiority of the PAM algorithm over K-means. To reduce or even eliminate the impact of “dimensionality curse” on high dimensional data processing, the high correlation filter (HCF) and the principal component analysis (PCA) method are also investigated in the algorithm. The proposed method has higher precision and more stable clustering results than the algorithms introduced in this paper for comparison in the real high-dimensional gene data under different clustering evaluation criteria.
  • loading
  • [1]
    MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA: AMS Press, 1967: 281-297.
    [2]
    JEFF WU C F. On the convergence properties of the EM algorithm[J]. Annals of Statistics, 1982, 11(1): 95-103.
    [3]
    PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(2): 3336-3341.
    [4]
    DING S F, ZHANG L W, ZHANG Y. Research on spectral clustering algorithms and prospects[C]// Proceedings of the 2nd International Conference on Computer Engineering and Technology. ChengDu, China: IEEE Press, 2010: 149-153.
    [5]
    LUXBURG U V. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395-416.
    [6]
    HAGEN L, KAHNG A. New spectral methods for ratio cut partitioning and clustering[J]. IEEE Transactions on Computer-Aided Design, 2006, 11(9): 1074-1085.
    [7]
    SHI J B, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
    [8]
    NG A Y, JORDAN M, WEISS Y. On spectral clustering: analysis and an algorithm[A]. Advances in Neural Information Processing Systems[M]. MIT Press, 2002: 849-856.
    [9]
    徐天顺. 谱聚类算法研究[J]. 电脑知识与技术, 2012, 8(16): 3948-3950.
    [10]
    Bodenhofer U, Kothmeier A, Hochreiter S. APCluster: An R package for affinity propagation clustering[J]. Bioinformatics, 2011, 27(17): 2463-2464.
    [11]
    谢国瑞. 线性代数及应用[M].北京:高等教育出版社,1999.
    [12]
    HUBERT L, ARABIE P. Comparing partitions[J]. Journal of Classification, 1985, 2(1):193-218.
    [13]
    SAPORTA, YOUNESS G. Comparing two partitions: Some proposals and experiments[J]. Compstat. Physica-Verlag HD, 2002: 243-248.
    [14]
    MARTIN L C, GLOOR G B, DUNN S D, et al. Using information theory to search for Co-evolving residues in proteins[J]. Bioinformatics, 2005, 21(22): 4116-24.
    [15]
    MIYAHARA S, KOMAZAKI Y, MIYAMOTO S. An algorithm combining spectral clustering and DBSCAN for core points[J]. Advances in Intelligent Systems & Computing, 2014, 245: 21-28.
    [16]
    MURTAGH F, LEGENDRE P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion?[J]. Journal of Classification, 2014, 31(3): 274-295.
    [17]
    DETTLING M. BagBoosting for tumor classification with gene expression data[J]. Bioinformatics, 2004, 20(18): 3583-3593.
    [18]
    邓小燕, 甘晓玲, 唐宜. 谱聚类算法在基因表达数据分析中的应用[J]. 现代计算机, 2014, (6): 8-12, 24.
    [19]
    祝国强,杭国明,滕海英,等. 谈谈两总体比较的非参数检验方法[J]. 数理医药杂志, 2011, 24(5): 524-525.
  • 加载中

Catalog

    [1]
    MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA: AMS Press, 1967: 281-297.
    [2]
    JEFF WU C F. On the convergence properties of the EM algorithm[J]. Annals of Statistics, 1982, 11(1): 95-103.
    [3]
    PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(2): 3336-3341.
    [4]
    DING S F, ZHANG L W, ZHANG Y. Research on spectral clustering algorithms and prospects[C]// Proceedings of the 2nd International Conference on Computer Engineering and Technology. ChengDu, China: IEEE Press, 2010: 149-153.
    [5]
    LUXBURG U V. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395-416.
    [6]
    HAGEN L, KAHNG A. New spectral methods for ratio cut partitioning and clustering[J]. IEEE Transactions on Computer-Aided Design, 2006, 11(9): 1074-1085.
    [7]
    SHI J B, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
    [8]
    NG A Y, JORDAN M, WEISS Y. On spectral clustering: analysis and an algorithm[A]. Advances in Neural Information Processing Systems[M]. MIT Press, 2002: 849-856.
    [9]
    徐天顺. 谱聚类算法研究[J]. 电脑知识与技术, 2012, 8(16): 3948-3950.
    [10]
    Bodenhofer U, Kothmeier A, Hochreiter S. APCluster: An R package for affinity propagation clustering[J]. Bioinformatics, 2011, 27(17): 2463-2464.
    [11]
    谢国瑞. 线性代数及应用[M].北京:高等教育出版社,1999.
    [12]
    HUBERT L, ARABIE P. Comparing partitions[J]. Journal of Classification, 1985, 2(1):193-218.
    [13]
    SAPORTA, YOUNESS G. Comparing two partitions: Some proposals and experiments[J]. Compstat. Physica-Verlag HD, 2002: 243-248.
    [14]
    MARTIN L C, GLOOR G B, DUNN S D, et al. Using information theory to search for Co-evolving residues in proteins[J]. Bioinformatics, 2005, 21(22): 4116-24.
    [15]
    MIYAHARA S, KOMAZAKI Y, MIYAMOTO S. An algorithm combining spectral clustering and DBSCAN for core points[J]. Advances in Intelligent Systems & Computing, 2014, 245: 21-28.
    [16]
    MURTAGH F, LEGENDRE P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion?[J]. Journal of Classification, 2014, 31(3): 274-295.
    [17]
    DETTLING M. BagBoosting for tumor classification with gene expression data[J]. Bioinformatics, 2004, 20(18): 3583-3593.
    [18]
    邓小燕, 甘晓玲, 唐宜. 谱聚类算法在基因表达数据分析中的应用[J]. 现代计算机, 2014, (6): 8-12, 24.
    [19]
    祝国强,杭国明,滕海英,等. 谈谈两总体比较的非参数检验方法[J]. 数理医药杂志, 2011, 24(5): 524-525.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return