ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

A novel voting method and parallel implementation for soft clustering

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2016.03.001
  • Received Date: 27 August 2015
  • Accepted Date: 29 September 2015
  • Rev Recd Date: 29 September 2015
  • Publish Date: 30 March 2016
  • As an important tool of Data Mining, clustering ensemble has been widely recognized and studied. This paper proposes a novel voting method for Soft Clustering(VMSC). The ensemble process consists of two steps: calculating the average degree of membership matrix as the input of the second step, and iterative optimization. This method deals well with eliminating the influences of noise and has good stability. The cloud computing platform of Spark handles big data efficiently. The VMSC algorithm was parallelizod to make it suitable for big data on Spark Cloud Computing platform. In the VMSC experiments, 12 UCI datasets were used to test it, and its results were compared with 4 other soft clustering ensemble algorithms: sCSPA, sMCLA, sHGBF and SVCE. The experiments indicate that the VMSC algorithm has a better integration effect. And the parallel experiments show that its parallel implementation manages big data efficiently.
    As an important tool of Data Mining, clustering ensemble has been widely recognized and studied. This paper proposes a novel voting method for Soft Clustering(VMSC). The ensemble process consists of two steps: calculating the average degree of membership matrix as the input of the second step, and iterative optimization. This method deals well with eliminating the influences of noise and has good stability. The cloud computing platform of Spark handles big data efficiently. The VMSC algorithm was parallelizod to make it suitable for big data on Spark Cloud Computing platform. In the VMSC experiments, 12 UCI datasets were used to test it, and its results were compared with 4 other soft clustering ensemble algorithms: sCSPA, sMCLA, sHGBF and SVCE. The experiments indicate that the VMSC algorithm has a better integration effect. And the parallel experiments show that its parallel implementation manages big data efficiently.
  • loading
  • [1]
    孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
    SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
    [2]
    CADES I V, SMYTH P, MANNILA H. Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction [C]// Proceedings of the 7th ACM SIGKDD. San Francisco, USA: ACM Press, 2001: 37-46.
    [3]
    JAIN A K, MURTY M N, FLYNN P J. Data clustering: A review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
    [4]
    WOLPERT D H, MACREADY W G. No free lunch theorems for search[R]. Santa Fe: Santa Fe Institute, Technical Report: SFI-TR-95-02-010, 1996.
    [5]
    STREHL A, GHOSH J. Cluster ensembles--a knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3(1): 583-617.
    [6]
    DIMITRIADOU E, WEINGESSEL A, HORNIK K. A combination scheme for fuzzy clustering[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2002, 16(7): 901-912.
    [7]
    FERN X Z, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the 21st International Conference on Machine Learning. Banff , Canada: ACM Press, 2004: 36.
    [8]
    TOPCHY A, JAIN A K, PUNCH W. A mixture model for clustering ensembles[C]//Proceedings of the SIAM International Conference on Data Mining. Baltimore, USA: SIAM Press, 2004: 379.
    [9]
    FRED A L N, JAIN A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
    [10]
    ZHOU Z H, TANG W. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1):77-83.
    [11]
    NGUYEN N, CARUANA R. Consensus clusterings[C]//Proceedings of the 7th International Conference on Data Mining. Omaha, USA: IEEE Press, 2007: 607-612.
    [12]
    TUMER K, AGOGINO A K. Ensemble clustering with voting active clusters [J]. Pattern Recognition Letters, 2008, 29(14):1947-1953.
    [13]
    WANG H, YANG Y, WANG H, et al. Soft-Voting Clustering Ensemble [A]// Lecture Notes in Computer Science, 2013, 7872: 307-318.
    [14]
    REN Y, DOMENICONI C, ZHANG G, et al. Weighted-object ensemble clustering[C]// Proceeding of the 13th International Conference on Data Mining. IEEE Press, 2013: 627-636.
    [15]
    CHAKERI A, HALL L O. Dominant sets as a framework for cluster ensembles: An evolutionary game theory approach[C]//Proceeding of 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE Press, 2014: 3457-3462.
    [16]
    DUMONCEAUX F, RASCHIA G, GELGON M. An algebraic approach to ensemble clustering[C]//Proceeding of 22nd International Conference on Pattern Recognition Stockholm, Sweden: IEEE Press, 2014: 1301-1306.
    [17]
    SU P, SHANG C, SHEN Q. Link-based pairwise similarity matrix approach for fuzzy c-means clustering ensemble[C]//Proceeding of 22nd International Conference on Fuzzy Systems. IEEE Press, 2014: 1538-1544.
    [18]
    HAO Z F, WANG L J, CAI R C, et al. An improved clustering ensemble method based link analysis[J]. World Wide Web, 2015, 18(2): 185-195.
    [19]
    ZHONG C, YUE X, ZHANG Z, et al. A clustering ensemble: Two-level-refined co-association matrix with path-based transformation[J]. Pattern Recognition, 2015, 48(8): 2699-2709.
    [20]
    DEAN J, GHEM A S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
    [21]
    STEPENOSKY N, GREEN D, KOUNIOS J, et al. Majority vote and decision template based ensemble classifiers trained on event related potentials for early diagnosis of Alzheimer's disease[C]//Proceeding of the International Conference on Acoustics, Speech and Signal Processing. Toulouse, France: IEEE Press, 2006: 1935-1941.
    [22]
    ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing[C]// Proceeding NSDI'12 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. San Jose, USA: IEEE Press, 2012, 70(2): 141-146.
    [23]
    DUNN J C. A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters[J]. Journal of Cybernetics, 1974, 3(3): 32-57.
    [24]
    SHENTAL N, BAR-HILLEL A, WEINSHALL D. Computing Gaussian Mixture Models with EM Using Side-Information[A]// Advances in Neural Information Processing Systems, International Conference on Machine Learning. MIT Press, 2003.
    [25]
    KUSHARY D. The EM algorithm and extensions[J]. Biometrics, 20088, 15(1): 154-156.
    [26]
    TOPCHY A P, LAW M H C, JAIN A K, et al. Analysis of consensus partition in cluster ensemble[C]//Proceeding of the 4th IEEE International Conference on Data Mining. Brighton, UK: ACM Press, 2004: 225-232.
    [27]
    PUNERA K, GHOSH J. Consensus-based ensembles of soft clusterings[J]. Applied Artificial Intelligence, 2008, 22(7-8): 780-810.
    [28]
    RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.)
  • 加载中

Catalog

    [1]
    孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
    SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
    [2]
    CADES I V, SMYTH P, MANNILA H. Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction [C]// Proceedings of the 7th ACM SIGKDD. San Francisco, USA: ACM Press, 2001: 37-46.
    [3]
    JAIN A K, MURTY M N, FLYNN P J. Data clustering: A review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
    [4]
    WOLPERT D H, MACREADY W G. No free lunch theorems for search[R]. Santa Fe: Santa Fe Institute, Technical Report: SFI-TR-95-02-010, 1996.
    [5]
    STREHL A, GHOSH J. Cluster ensembles--a knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3(1): 583-617.
    [6]
    DIMITRIADOU E, WEINGESSEL A, HORNIK K. A combination scheme for fuzzy clustering[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2002, 16(7): 901-912.
    [7]
    FERN X Z, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the 21st International Conference on Machine Learning. Banff , Canada: ACM Press, 2004: 36.
    [8]
    TOPCHY A, JAIN A K, PUNCH W. A mixture model for clustering ensembles[C]//Proceedings of the SIAM International Conference on Data Mining. Baltimore, USA: SIAM Press, 2004: 379.
    [9]
    FRED A L N, JAIN A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
    [10]
    ZHOU Z H, TANG W. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1):77-83.
    [11]
    NGUYEN N, CARUANA R. Consensus clusterings[C]//Proceedings of the 7th International Conference on Data Mining. Omaha, USA: IEEE Press, 2007: 607-612.
    [12]
    TUMER K, AGOGINO A K. Ensemble clustering with voting active clusters [J]. Pattern Recognition Letters, 2008, 29(14):1947-1953.
    [13]
    WANG H, YANG Y, WANG H, et al. Soft-Voting Clustering Ensemble [A]// Lecture Notes in Computer Science, 2013, 7872: 307-318.
    [14]
    REN Y, DOMENICONI C, ZHANG G, et al. Weighted-object ensemble clustering[C]// Proceeding of the 13th International Conference on Data Mining. IEEE Press, 2013: 627-636.
    [15]
    CHAKERI A, HALL L O. Dominant sets as a framework for cluster ensembles: An evolutionary game theory approach[C]//Proceeding of 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE Press, 2014: 3457-3462.
    [16]
    DUMONCEAUX F, RASCHIA G, GELGON M. An algebraic approach to ensemble clustering[C]//Proceeding of 22nd International Conference on Pattern Recognition Stockholm, Sweden: IEEE Press, 2014: 1301-1306.
    [17]
    SU P, SHANG C, SHEN Q. Link-based pairwise similarity matrix approach for fuzzy c-means clustering ensemble[C]//Proceeding of 22nd International Conference on Fuzzy Systems. IEEE Press, 2014: 1538-1544.
    [18]
    HAO Z F, WANG L J, CAI R C, et al. An improved clustering ensemble method based link analysis[J]. World Wide Web, 2015, 18(2): 185-195.
    [19]
    ZHONG C, YUE X, ZHANG Z, et al. A clustering ensemble: Two-level-refined co-association matrix with path-based transformation[J]. Pattern Recognition, 2015, 48(8): 2699-2709.
    [20]
    DEAN J, GHEM A S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
    [21]
    STEPENOSKY N, GREEN D, KOUNIOS J, et al. Majority vote and decision template based ensemble classifiers trained on event related potentials for early diagnosis of Alzheimer's disease[C]//Proceeding of the International Conference on Acoustics, Speech and Signal Processing. Toulouse, France: IEEE Press, 2006: 1935-1941.
    [22]
    ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing[C]// Proceeding NSDI'12 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. San Jose, USA: IEEE Press, 2012, 70(2): 141-146.
    [23]
    DUNN J C. A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters[J]. Journal of Cybernetics, 1974, 3(3): 32-57.
    [24]
    SHENTAL N, BAR-HILLEL A, WEINSHALL D. Computing Gaussian Mixture Models with EM Using Side-Information[A]// Advances in Neural Information Processing Systems, International Conference on Machine Learning. MIT Press, 2003.
    [25]
    KUSHARY D. The EM algorithm and extensions[J]. Biometrics, 20088, 15(1): 154-156.
    [26]
    TOPCHY A P, LAW M H C, JAIN A K, et al. Analysis of consensus partition in cluster ensemble[C]//Proceeding of the 4th IEEE International Conference on Data Mining. Brighton, UK: ACM Press, 2004: 225-232.
    [27]
    PUNERA K, GHOSH J. Consensus-based ensembles of soft clusterings[J]. Applied Artificial Intelligence, 2008, 22(7-8): 780-810.
    [28]
    RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.)

    Article Metrics

    Article views (44) PDF downloads(91)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return