ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Core-points based spectral clustering for big data analysis

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2016.09.007
  • Received Date: 01 March 2016
  • Accepted Date: 17 September 2016
  • Rev Recd Date: 17 September 2016
  • Publish Date: 30 September 2016
  • With regard to failures in applying spectral clustering to big data due to its computation complexity, a new spectral clustering algorithm for big data was proposed. Firstly, core-points based on random sampling and data similarity were selected, with which, the big data were grouped. Secondly, spectral clustering was applied to the core-points. Finally, the clustering of whole data was completed by combining the clustering result of the core-points and the grouped big data information. The algorithm both promotes the spectral clustering to big data and reduces the influence of noise or abnormal data by the core-points. A large number of experiments fully verify the effectiveness of the method proposed in this paper.
    With regard to failures in applying spectral clustering to big data due to its computation complexity, a new spectral clustering algorithm for big data was proposed. Firstly, core-points based on random sampling and data similarity were selected, with which, the big data were grouped. Secondly, spectral clustering was applied to the core-points. Finally, the clustering of whole data was completed by combining the clustering result of the core-points and the grouped big data information. The algorithm both promotes the spectral clustering to big data and reduces the influence of noise or abnormal data by the core-points. A large number of experiments fully verify the effectiveness of the method proposed in this paper.
  • loading
  • [1]
    刘冰.Web数据挖掘[M].北京:清华大学出版社,2011.
    [2]
    KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis[M]. New York: Wiley, 1990.
    [3]
    XU R, WUNSCH D. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
    [4]
    SHI J B, MALIK J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2000, 22(8): 888-905.
    [5]
    MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium Mathematical Statistics Probability. Berkeley: 1967: 281-297.
    [6]
    WILLIAMS P K, SOARES C V, GILBERT J E. A clustering rule based approach for classification problems [J]. International Journal of Data Warehousing and Mining, 2010, 8(1): 1-23.
    [7]
    FOWLKES C, BELONGIE S, FAN C, et al. Spectral grouping using the Nystrm method [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(2): 214-225.
    [8]
    ZHANG K, KWOK J T. Clustered Nystrm method for large scale manifold learning and dimension reduction [J]. IEEE Transactions on Neural Networks, 2010, 21(10):1576-1587.
    [9]
    DING S F, JIA H J, SHI Z Z. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis [J]. Journal of Software, 2014, 25(9): 2037-2049.
    [10]
    CHEN X L, DENG C. Large scale spectral clustering with landmark-based representation[C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2011: 313-318.
    [11]
    YAN D H, HUANG L, JORDAN M I. Fast approximate spectral clustering[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris,France: ACM Press, 2009: 907-916.
    [12]
    SHINNOU H, SASAKI M. Spectral clustering for a large data set by reducing the similarity matrix size[C]// Proceedings of the 6th International Language Resources and Evaluation. 2008.
    [13]
    VISWANATH P, BABU V S. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets [J]. Pattern Recognition Letters, 2009, 30(16): 1477-1488.
    [14]
    ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases [J]. ACM SIGMOD Record, 1999, 25(2): 103-114.
    [15]
    马儒宁,王秀丽,丁军娣.多层核心集凝聚算法[J].软件学报,2013,24(3):490-506.
    MA R N,WANG X L,DING J D.Multilevel core-sets based aggregation clustering algorithm [J]. Journal of Software,2013,24(3):490-506.)
  • 加载中

Catalog

    [1]
    刘冰.Web数据挖掘[M].北京:清华大学出版社,2011.
    [2]
    KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis[M]. New York: Wiley, 1990.
    [3]
    XU R, WUNSCH D. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
    [4]
    SHI J B, MALIK J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2000, 22(8): 888-905.
    [5]
    MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium Mathematical Statistics Probability. Berkeley: 1967: 281-297.
    [6]
    WILLIAMS P K, SOARES C V, GILBERT J E. A clustering rule based approach for classification problems [J]. International Journal of Data Warehousing and Mining, 2010, 8(1): 1-23.
    [7]
    FOWLKES C, BELONGIE S, FAN C, et al. Spectral grouping using the Nystrm method [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(2): 214-225.
    [8]
    ZHANG K, KWOK J T. Clustered Nystrm method for large scale manifold learning and dimension reduction [J]. IEEE Transactions on Neural Networks, 2010, 21(10):1576-1587.
    [9]
    DING S F, JIA H J, SHI Z Z. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis [J]. Journal of Software, 2014, 25(9): 2037-2049.
    [10]
    CHEN X L, DENG C. Large scale spectral clustering with landmark-based representation[C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2011: 313-318.
    [11]
    YAN D H, HUANG L, JORDAN M I. Fast approximate spectral clustering[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris,France: ACM Press, 2009: 907-916.
    [12]
    SHINNOU H, SASAKI M. Spectral clustering for a large data set by reducing the similarity matrix size[C]// Proceedings of the 6th International Language Resources and Evaluation. 2008.
    [13]
    VISWANATH P, BABU V S. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets [J]. Pattern Recognition Letters, 2009, 30(16): 1477-1488.
    [14]
    ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases [J]. ACM SIGMOD Record, 1999, 25(2): 103-114.
    [15]
    马儒宁,王秀丽,丁军娣.多层核心集凝聚算法[J].软件学报,2013,24(3):490-506.
    MA R N,WANG X L,DING J D.Multilevel core-sets based aggregation clustering algorithm [J]. Journal of Software,2013,24(3):490-506.)

    Article Metrics

    Article views (34) PDF downloads(88)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return