ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Ensembles of classifier chains for multi-label classification based on Spark

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2017.04.010
  • Received Date: 28 August 2016
  • Rev Recd Date: 08 December 2016
  • Publish Date: 30 April 2017
  • With the wide application of data mining technology, multi-label learning has become a hot topic in the data mining domain. Although ensembles of classifier chains (ECC) algorithm is a multi-label learning method which is effective and accurate, its complexity of time and space is so high that it cannot adapt to the large-scale multi-label classification tasks. A new algorithm named Spark ensembles of classifier chains(S-ECC) was proposed based on Spark platform on which a parallel implementation was conducted of each step of the sequential ECC algorithm. The test results in stand-alone and cluster environments show that S-ECC has a good adaptability to large-scale data with a high speedup, and that it is no less capable than the traditional sequential program.
    With the wide application of data mining technology, multi-label learning has become a hot topic in the data mining domain. Although ensembles of classifier chains (ECC) algorithm is a multi-label learning method which is effective and accurate, its complexity of time and space is so high that it cannot adapt to the large-scale multi-label classification tasks. A new algorithm named Spark ensembles of classifier chains(S-ECC) was proposed based on Spark platform on which a parallel implementation was conducted of each step of the sequential ECC algorithm. The test results in stand-alone and cluster environments show that S-ECC has a good adaptability to large-scale data with a high speedup, and that it is no less capable than the traditional sequential program.
  • loading
  • [1]
    ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
    [2]
    READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333-359.
    [3]
    ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing[C]// Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. San Hose, USA: USENIX Association, 2012: 141-146.
    [4]
    ZHU B, MARA A, MOZO A. CLUS: Parallel subspace clustering algorithm on spark[A]// New Trends in Databases and Information Systems[M]. Springer, 2015, 539:175-185.
    [5]
    MAILLO J, RAMREZ S, TRIGUERO I, et al. kNN-IS: An iterative spark-based design of the k-nearest neighbors classifier for big data[J]. Knowledge-Based Systems, 2016, 117: 3-15; doi: 10.1016/j.knosys.2016.06.012.
    [6]
    KIM H, PARK J, JANG J, et al. DeepSpark: Spark-based deep learning supporting asynchronous updates and Caffe compatibility[J/OL]. https://arxiv.org/abs/1602.08191v1,2016.03.08, 2016: arXiv:1602.08191v1.
    [7]
    DUAN M X, LI K L, TANG Z, et al. Selection and replacement algorithms for memory performance improvement in Spark[J]. Concurrency & Computation Practice & Experience, 2015, 28(8): 2473-2486.
    [8]
    TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data[A]// Data Mining and Knowledge Discovery Handbook[M]. Springer, 2009: 667-685.
    [9]
    READ J, PFAHRINGER B, HOLMES G. Multi-label classification using ensembles of pruned sets[C]// Proceedings of the 8th International Conference on Data Mining. Pisa, Italy: IEEE Computer Society, 2008: 995-1000.
    [10]
    TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al. Mulan: A Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011, 12(2): 2411-2414.
    [11]
    MENCA E L, FRNKRANZ J. Efficient pairwise multilabel classification for large-scale problems in the legal domain[C]// European Conference on Machine Learning & Knowledge Discovery in Databases. Antwerp, Belgium: Springer, 2008: 50-65.
    [12]
    SNOEK C G M, WORRING M, VAN GEMERT J C, et al. The challenge problem for automated detection of 101 semantic concepts in multimedia[C]// Proceedings of the 14th ACM International Conference on Multimedia. Santa Barbara, USA: ACM Press. 2006: 421-430.
    [13]
    CHUA T S, TANG J, HONG R, et al. NUS-WIDE: A real-world web image database from National University of Singapore[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM Press, 2009: doi: 10.1145/1646396.1646452.
    [14]
    SPYROMITROS-XIOUFIS E, PAPADOPOULOS S, KOMPATSIARIS I Y, et al. A comprehensive study over VLAD and product quantization in large-scale image retrieval[J]. IEEE Transactions on Multimedia, 2014, 16(6): 1713-1728.
    [15]
    ZHANG M L, WU L. LIFT: Multi-label learning with label-specific features[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(1): 107-20.
    [16]
    FRNKRANZ J, HLLERMEIER E, MENCA E L, et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008, 73(2): 133-153.
    [17]
    TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(7): 1079-1089.
  • 加载中

Catalog

    [1]
    ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
    [2]
    READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333-359.
    [3]
    ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing[C]// Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. San Hose, USA: USENIX Association, 2012: 141-146.
    [4]
    ZHU B, MARA A, MOZO A. CLUS: Parallel subspace clustering algorithm on spark[A]// New Trends in Databases and Information Systems[M]. Springer, 2015, 539:175-185.
    [5]
    MAILLO J, RAMREZ S, TRIGUERO I, et al. kNN-IS: An iterative spark-based design of the k-nearest neighbors classifier for big data[J]. Knowledge-Based Systems, 2016, 117: 3-15; doi: 10.1016/j.knosys.2016.06.012.
    [6]
    KIM H, PARK J, JANG J, et al. DeepSpark: Spark-based deep learning supporting asynchronous updates and Caffe compatibility[J/OL]. https://arxiv.org/abs/1602.08191v1,2016.03.08, 2016: arXiv:1602.08191v1.
    [7]
    DUAN M X, LI K L, TANG Z, et al. Selection and replacement algorithms for memory performance improvement in Spark[J]. Concurrency & Computation Practice & Experience, 2015, 28(8): 2473-2486.
    [8]
    TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data[A]// Data Mining and Knowledge Discovery Handbook[M]. Springer, 2009: 667-685.
    [9]
    READ J, PFAHRINGER B, HOLMES G. Multi-label classification using ensembles of pruned sets[C]// Proceedings of the 8th International Conference on Data Mining. Pisa, Italy: IEEE Computer Society, 2008: 995-1000.
    [10]
    TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al. Mulan: A Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011, 12(2): 2411-2414.
    [11]
    MENCA E L, FRNKRANZ J. Efficient pairwise multilabel classification for large-scale problems in the legal domain[C]// European Conference on Machine Learning & Knowledge Discovery in Databases. Antwerp, Belgium: Springer, 2008: 50-65.
    [12]
    SNOEK C G M, WORRING M, VAN GEMERT J C, et al. The challenge problem for automated detection of 101 semantic concepts in multimedia[C]// Proceedings of the 14th ACM International Conference on Multimedia. Santa Barbara, USA: ACM Press. 2006: 421-430.
    [13]
    CHUA T S, TANG J, HONG R, et al. NUS-WIDE: A real-world web image database from National University of Singapore[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM Press, 2009: doi: 10.1145/1646396.1646452.
    [14]
    SPYROMITROS-XIOUFIS E, PAPADOPOULOS S, KOMPATSIARIS I Y, et al. A comprehensive study over VLAD and product quantization in large-scale image retrieval[J]. IEEE Transactions on Multimedia, 2014, 16(6): 1713-1728.
    [15]
    ZHANG M L, WU L. LIFT: Multi-label learning with label-specific features[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(1): 107-20.
    [16]
    FRNKRANZ J, HLLERMEIER E, MENCA E L, et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008, 73(2): 133-153.
    [17]
    TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(7): 1079-1089.

    Article Metrics

    Article views (961) PDF downloads(282)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return