Parallel ISOMAP algorithm based on Spark

SHI Lukui; GUO Linlin; FANG Zizhe; ZHANG Jun

doi:10.3969/j.issn.0253-2778.2019.10.010

PDF( 4217 KB)

Open Access JUSTC Original Paper

Parallel ISOMAP algorithm based on Spark

1.
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
2.
Hebei Province Bigdata Computation Key Laboratory, Tianjin 300401, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2019.10.010

Received Date: 15 May 2019
Accepted Date: 26 August 2019
Rev Recd Date: 26 August 2019
Publish Date: 31 October 2019

Abstract Full text PDF

Abstract

Abstract

To reduce the dimension of the nonlinear high-dimensional data in the big data environment, a parallel ISOMAP algorithm based on Spark is proposed, where a Spark-based parallel block Davidson method is designed and implemented to quickly solve eigenvalues and eigenvectors of the large scale matrices. Simultaneously, a row-block matrix multiplication strategy based on RDD partition is proposed for the difficulty of computation and transmission of the large scale matrices, which converts the matrix rows in each partition into block matrices. The row-block matrices are not restricted by the map operator to RDD calculation one by one, and can treat operations at the matrix level by using linear algebraic Library in Spark. The experimental results show that the row-block matrix multiplication strategy effectively improves the efficiency of matrix operations; the parallel block Davidson method can quickly solve the eigenvalues and eigenvectors of the large scale matrices and effectively improve the performance of parallel ISOMAP algorithm; and the parallel ISOMAP algorithm can adapt to dimensionality reduction in the big data environment.

Abstract

To reduce the dimension of the nonlinear high-dimensional data in the big data environment, a parallel ISOMAP algorithm based on Spark is proposed, where a Spark-based parallel block Davidson method is designed and implemented to quickly solve eigenvalues and eigenvectors of the large scale matrices. Simultaneously, a row-block matrix multiplication strategy based on RDD partition is proposed for the difficulty of computation and transmission of the large scale matrices, which converts the matrix rows in each partition into block matrices. The row-block matrices are not restricted by the map operator to RDD calculation one by one, and can treat operations at the matrix level by using linear algebraic Library in Spark. The experimental results show that the row-block matrix multiplication strategy effectively improves the efficiency of matrix operations; the parallel block Davidson method can quickly solve the eigenvalues and eigenvectors of the large scale matrices and effectively improve the performance of parallel ISOMAP algorithm; and the parallel ISOMAP algorithm can adapt to dimensionality reduction in the big data environment.

FullText(HTML)

References(21)

References

[1]	TENENBAUM J B, DE SILVA V, LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5): 2319–2323.
[2]	PLESS R, SOUVENIR R. A survey of manifold learning for images [J]. IPSJ Transactions on Computer Vision & Applications, 2009, 1(1):83-94.
[3]	SARVENIAZI A. An actual survey of dimensionality reduction [J]. American Journal of Computational Mathematics, 2014, 4: 55-72.
[4]	SINGH K P, BHAI R, MISHRA V, etal.Localization in wireless sensor network using LLE-ISOMAP algorithm[C]// Proceedings of IEEE Region 10 Conference. Penang, Malaysia: IEEE, 2017:393-397.
[5]	RAJAGURU H, KUMAR PRABHAKAR S. Performance analysis of local linear embedding and Hessian LLE with hybrid ABC-PSO for epilepsy classification from EEG signals[C]//Proceedings of International Conference on Inventive Research in Computing Applications. Coimbatore, India: IEEE, 2018:1084-1088.
[6]	YEH T T, CHEN T Y , CHEN Y C , et al. Efficient parallel algorithm for nonlinear dimensionality reduction on GPU[C]// Proceedings of International Conference on Granular Computing. San Jose, USA:IEEE, 2010:592-597.
[7]	LI W, ZHANG L, DU B. GPU parallel implementation of isometric mapping for hyperspectral classification [J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(9): 1532-1536.
[8]	李毅. 基于Hadoop平台的局部线性嵌入算法研究[D]. 广州，华南理工大学，2011.
[9]	卞云龙. 基于云计算平台的大规模流形学习算法研究[D]. 南京，南京理工大学，2012.
[10]	刘勇. 头部姿态估计的监督流形学习研究及其并行化扩展[D]. 厦门，厦门大学，2013.
[11]	薛永坚, 倪志伟. 基于MapReduce的大规模数据集流形学习降维研究[J]. 系统工程理论与实践, 2014, 34:151-157. XUE Yongjian, NI Zhiwei.Research of large scale manifold learning based on MapReduce[J]. Systems Engineering - Theory & Practice, 2014, 34: 151-157.
[12]	CAMPANA-OLIVO R, MANIAN V B. Parallel implementation of nonlinear dimensionality reduction methods applied in object segmentation using CUDA in GPU[C]// Proceedings of the International Society for Optical Engineering. SPIE, 2011:289-293.
[13]	MOUSTAFA M, EBEID H M, HELMY A, et al. Parallel implementation of super-resolution based neighbor embedding using GPU [C]// Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Springer, 2016: 628-638.
[14]	SAMUDRALA S K, ZOLA J, ALURU S, et al. Parallel framework for dimensionality reduction of large-scale datasets [J]. Scientific Programming, 2015, No.1: 1-12.
[15]	SCHOENEMAN F, ZOLA J. Scalable manifold learning for big data with Apache Spark [C]// Proceedings of IEEE International Conference on Big Data. Seattle, USA: IEEE, 2018:272-281.
[16]	WANG YH, GAO Y, XU C. Manifold learning method for large scale dataset based on gradient descent [C]//Proceedings of 3rd International Conference on Multimedia Technology.Springer, 2013, 84:1187-1194.
[17]	ZHOU Y. A block Chebyshev-Davidson method with inner-outer restart for large eigenvalue problems[J]. Journal of Computational Physics, 2010, 229(24):9188-9200.
[18]	SADKANE M, SIDJE R B. Implementation of a variable block Davidson method with deflation for solving large sparse eigenproblems[J]. Numerical Algorithms, 1999, 20:217-240
[19]	王顺绪,戴华.求解大型矩阵特征值问题的并行块Davidson方法[J].南京航空航天大学学报，2007，39(6): 814-818. WANG Shunxu, DAI Hua. Parallel Block Davidson Method for Solving Large Scale Eigenvalue Problem[J]. Journal of Nanjing University of Aeronautics & Astronautics,2007，39(6): 814-818.
[20]	DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions [C] // Proceedings 20th of Annual symposium on Computational Geometry. New York, USA: ACM, 2004: 253-262.
[21]	SAUL L K, ROWEIS S.An introduction to locally linear embedding [EB/OL].Journal of Machine Learning Research, [2019-8-30] http://www.cs.toronto.edu/~roweis/lle,2001.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	TENENBAUM J B, DE SILVA V, LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5): 2319–2323.
[2]	PLESS R, SOUVENIR R. A survey of manifold learning for images [J]. IPSJ Transactions on Computer Vision & Applications, 2009, 1(1):83-94.
[3]	SARVENIAZI A. An actual survey of dimensionality reduction [J]. American Journal of Computational Mathematics, 2014, 4: 55-72.
[4]	SINGH K P, BHAI R, MISHRA V, etal.Localization in wireless sensor network using LLE-ISOMAP algorithm[C]// Proceedings of IEEE Region 10 Conference. Penang, Malaysia: IEEE, 2017:393-397.
[5]	RAJAGURU H, KUMAR PRABHAKAR S. Performance analysis of local linear embedding and Hessian LLE with hybrid ABC-PSO for epilepsy classification from EEG signals[C]//Proceedings of International Conference on Inventive Research in Computing Applications. Coimbatore, India: IEEE, 2018:1084-1088.
[6]	YEH T T, CHEN T Y , CHEN Y C , et al. Efficient parallel algorithm for nonlinear dimensionality reduction on GPU[C]// Proceedings of International Conference on Granular Computing. San Jose, USA:IEEE, 2010:592-597.
[7]	LI W, ZHANG L, DU B. GPU parallel implementation of isometric mapping for hyperspectral classification [J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(9): 1532-1536.
[8]	李毅. 基于Hadoop平台的局部线性嵌入算法研究[D]. 广州，华南理工大学，2011.
[9]	卞云龙. 基于云计算平台的大规模流形学习算法研究[D]. 南京，南京理工大学，2012.
[10]	刘勇. 头部姿态估计的监督流形学习研究及其并行化扩展[D]. 厦门，厦门大学，2013.
[11]	薛永坚, 倪志伟. 基于MapReduce的大规模数据集流形学习降维研究[J]. 系统工程理论与实践, 2014, 34:151-157. XUE Yongjian, NI Zhiwei.Research of large scale manifold learning based on MapReduce[J]. Systems Engineering - Theory & Practice, 2014, 34: 151-157.
[12]	CAMPANA-OLIVO R, MANIAN V B. Parallel implementation of nonlinear dimensionality reduction methods applied in object segmentation using CUDA in GPU[C]// Proceedings of the International Society for Optical Engineering. SPIE, 2011:289-293.
[13]	MOUSTAFA M, EBEID H M, HELMY A, et al. Parallel implementation of super-resolution based neighbor embedding using GPU [C]// Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Springer, 2016: 628-638.
[14]	SAMUDRALA S K, ZOLA J, ALURU S, et al. Parallel framework for dimensionality reduction of large-scale datasets [J]. Scientific Programming, 2015, No.1: 1-12.
[15]	SCHOENEMAN F, ZOLA J. Scalable manifold learning for big data with Apache Spark [C]// Proceedings of IEEE International Conference on Big Data. Seattle, USA: IEEE, 2018:272-281.
[16]	WANG YH, GAO Y, XU C. Manifold learning method for large scale dataset based on gradient descent [C]//Proceedings of 3rd International Conference on Multimedia Technology.Springer, 2013, 84:1187-1194.
[17]	ZHOU Y. A block Chebyshev-Davidson method with inner-outer restart for large eigenvalue problems[J]. Journal of Computational Physics, 2010, 229(24):9188-9200.
[18]	SADKANE M, SIDJE R B. Implementation of a variable block Davidson method with deflation for solving large sparse eigenproblems[J]. Numerical Algorithms, 1999, 20:217-240
[19]	王顺绪,戴华.求解大型矩阵特征值问题的并行块Davidson方法[J].南京航空航天大学学报，2007，39(6): 814-818. WANG Shunxu, DAI Hua. Parallel Block Davidson Method for Solving Large Scale Eigenvalue Problem[J]. Journal of Nanjing University of Aeronautics & Astronautics,2007，39(6): 814-818.
[20]	DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions [C] // Proceedings 20th of Annual symposium on Computational Geometry. New York, USA: ACM, 2004: 253-262.
[21]	SAUL L K, ROWEIS S.An introduction to locally linear embedding [EB/OL].Journal of Machine Learning Research, [2019-8-30] http://www.cs.toronto.edu/~roweis/lle,2001.)

TrendMD

Volume 49 Issue 10 page: 842-850

Cover

Keywords

Article Metrics

Article views (77) PDF downloads(123)

Parallel ISOMAP algorithm based on Spark

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Parallel ISOMAP algorithm based on Spark

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content