Core-points based spectral clustering for big data analysis

YANG Yi; MA Runing

doi:10.3969/j.issn.0253-2778.2016.09.007

PDF( 10828 KB)

Open Access JUSTC Original Paper

Core-points based spectral clustering for big data analysis

YANG Yi,
MA Runing

College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2016.09.007

Received Date: 01 March 2016
Accepted Date: 17 September 2016
Rev Recd Date: 17 September 2016
Publish Date: 30 September 2016

Abstract Full text PDF

Abstract

Abstract

With regard to failures in applying spectral clustering to big data due to its computation complexity, a new spectral clustering algorithm for big data was proposed. Firstly, core-points based on random sampling and data similarity were selected, with which, the big data were grouped. Secondly, spectral clustering was applied to the core-points. Finally, the clustering of whole data was completed by combining the clustering result of the core-points and the grouped big data information. The algorithm both promotes the spectral clustering to big data and reduces the influence of noise or abnormal data by the core-points. A large number of experiments fully verify the effectiveness of the method proposed in this paper.

Abstract

With regard to failures in applying spectral clustering to big data due to its computation complexity, a new spectral clustering algorithm for big data was proposed. Firstly, core-points based on random sampling and data similarity were selected, with which, the big data were grouped. Secondly, spectral clustering was applied to the core-points. Finally, the clustering of whole data was completed by combining the clustering result of the core-points and the grouped big data information. The algorithm both promotes the spectral clustering to big data and reduces the influence of noise or abnormal data by the core-points. A large number of experiments fully verify the effectiveness of the method proposed in this paper.

FullText(HTML)

References(15)

References

[1]	刘冰．Web数据挖掘[M]．北京：清华大学出版社，2011.
[2]	KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis[M]. New York: Wiley, 1990.
[3]	XU R, WUNSCH D. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
[4]	SHI J B, MALIK J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2000, 22(8): 888-905.
[5]	MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium Mathematical Statistics Probability. Berkeley: 1967: 281-297.
[6]	WILLIAMS P K, SOARES C V, GILBERT J E. A clustering rule based approach for classification problems [J]. International Journal of Data Warehousing and Mining, 2010, 8(1): 1-23.
[7]	FOWLKES C, BELONGIE S, FAN C, et al. Spectral grouping using the Nystrm method [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(2): 214-225.
[8]	ZHANG K, KWOK J T. Clustered Nystrm method for large scale manifold learning and dimension reduction [J]. IEEE Transactions on Neural Networks, 2010, 21(10):1576-1587.
[9]	DING S F, JIA H J, SHI Z Z. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis [J]. Journal of Software, 2014, 25(9): 2037-2049.
[10]	CHEN X L, DENG C. Large scale spectral clustering with landmark-based representation[C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2011: 313-318.
[11]	YAN D H, HUANG L, JORDAN M I. Fast approximate spectral clustering[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris,France: ACM Press, 2009: 907-916.
[12]	SHINNOU H, SASAKI M. Spectral clustering for a large data set by reducing the similarity matrix size[C]// Proceedings of the 6th International Language Resources and Evaluation. 2008.
[13]	VISWANATH P, BABU V S. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets [J]. Pattern Recognition Letters, 2009, 30(16): 1477-1488.
[14]	ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases [J]. ACM SIGMOD Record, 1999, 25(2): 103-114.
[15]	马儒宁，王秀丽，丁军娣．多层核心集凝聚算法[J]．软件学报，2013，24(3)：490-506. MA R N，WANG X L，DING J D．Multilevel core-sets based aggregation clustering algorithm [J]. Journal of Software，2013，24(3)：490-506.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	刘冰．Web数据挖掘[M]．北京：清华大学出版社，2011.
[2]	KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis[M]. New York: Wiley, 1990.
[3]	XU R, WUNSCH D. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
[4]	SHI J B, MALIK J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2000, 22(8): 888-905.
[5]	MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium Mathematical Statistics Probability. Berkeley: 1967: 281-297.
[6]	WILLIAMS P K, SOARES C V, GILBERT J E. A clustering rule based approach for classification problems [J]. International Journal of Data Warehousing and Mining, 2010, 8(1): 1-23.
[7]	FOWLKES C, BELONGIE S, FAN C, et al. Spectral grouping using the Nystrm method [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(2): 214-225.
[8]	ZHANG K, KWOK J T. Clustered Nystrm method for large scale manifold learning and dimension reduction [J]. IEEE Transactions on Neural Networks, 2010, 21(10):1576-1587.
[9]	DING S F, JIA H J, SHI Z Z. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis [J]. Journal of Software, 2014, 25(9): 2037-2049.
[10]	CHEN X L, DENG C. Large scale spectral clustering with landmark-based representation[C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2011: 313-318.
[11]	YAN D H, HUANG L, JORDAN M I. Fast approximate spectral clustering[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris,France: ACM Press, 2009: 907-916.
[12]	SHINNOU H, SASAKI M. Spectral clustering for a large data set by reducing the similarity matrix size[C]// Proceedings of the 6th International Language Resources and Evaluation. 2008.
[13]	VISWANATH P, BABU V S. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets [J]. Pattern Recognition Letters, 2009, 30(16): 1477-1488.
[14]	ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases [J]. ACM SIGMOD Record, 1999, 25(2): 103-114.
[15]	马儒宁，王秀丽，丁军娣．多层核心集凝聚算法[J]．软件学报，2013，24(3)：490-506. MA R N，WANG X L，DING J D．Multilevel core-sets based aggregation clustering algorithm [J]. Journal of Software，2013，24(3)：490-506.)

TrendMD

Volume 46 Issue 9 page: 757-763

Cover

Keywords

Article Metrics

Article views (34) PDF downloads(88)

Core-points based spectral clustering for big data analysis

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Core-points based spectral clustering for big data analysis

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content