Spark/Shark-based OLAP system for smart grid applications

WANG Yaling; LIU Yue; HONG Jianguang; CUI Wei; LI Yanhu; SU Yipeng; HUANG Gaopan; ZHANG Mingming; LIU Wantao

doi:10.3969/j.issn.0253-2778.2016.01.009

PDF( 2486 KB)

Open Access JUSTC Original Paper

Spark/Shark-based OLAP system for smart grid applications

1.
State Grid Information & Telecommunication Group Co. Ltd., Beijing 100761, China
2.
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
3.
State Grid Zhejiang Electric Power Company, Hangzhou 310007, China
4.
State Grid Jiangsu Electric Power Company Information &Telecommunication branch, Nanjing 210029, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2016.01.009

Received Date: 27 August 2015
Accepted Date: 29 September 2015
Rev Recd Date: 29 September 2015
Publish Date: 30 January 2016

Abstract Full text PDF

Abstract

Abstract

The OLAP queries on electricity consumption information in Smart Grid have some prominent features: huge amounts of data, involving multiple tables in a joint operation, complex SQL structure, etc. Faced with this kind of applications, traditional RDBMS always leads to poor scalability, low write throughput, and unacceptable query performance, etc. A Spark/Shark-Based OLAP system for electricity consumption information in smart grid was designed. The system used distributed file system HDFS for data storage, and makes use of Shark to parse the SQL queries and Spark to execute them. However, Shark does not support fine-grained index, which hinders further improvement of query performance. To overcome this limitation, a Trie tree based fine-grained index technique TrieIndex and data re-organization scheme for better query performance was proposed. The experiment results with real electricity consumption information data and query show that the write throughput of the system is 12 times faster than that of RDBMS, and the query efficiency of the system is 10 times greater than that of original Shark.

Abstract

The OLAP queries on electricity consumption information in Smart Grid have some prominent features: huge amounts of data, involving multiple tables in a joint operation, complex SQL structure, etc. Faced with this kind of applications, traditional RDBMS always leads to poor scalability, low write throughput, and unacceptable query performance, etc. A Spark/Shark-Based OLAP system for electricity consumption information in smart grid was designed. The system used distributed file system HDFS for data storage, and makes use of Shark to parse the SQL queries and Spark to execute them. However, Shark does not support fine-grained index, which hinders further improvement of query performance. To overcome this limitation, a Trie tree based fine-grained index technique TrieIndex and data re-organization scheme for better query performance was proposed. The experiment results with real electricity consumption information data and query show that the write throughput of the system is 12 times faster than that of RDBMS, and the query efficiency of the system is 10 times greater than that of original Shark.

FullText(HTML)

References(14)

References

[1]	Apache Hadoop. Welcome to apache hadoop[EB/OL]. https://hadoop.apache.org/.
[2]	Spark. Lightning-fast cluster computing[EB/OL]. https://spark.apache.org/.
[3]	Zaharia M, Chowdhury M, Franklin M J, et al. Spark: Cluster computing with working sets[C]// Proceedings of the 2nd USENIX Conference on Hot Ttopics in Cloud Computing. Boston, USA: USENIX, 2010: 10-14.
[4]	Xin R S, Rosen J, Zaharia M, et al. Shark: SQL and rich analytics at scale[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2013:13-24.
[5]	Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 922-933.
[6]	Jiang D W, Ooi B C, Shi L, et al. The performance of MapReduce: An in-depth study[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 472-483.
[7]	Dittrich J, Quiané-Ruiz J A, Jindal A, et al. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing) [J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 515-529.
[8]	Eltabakh M Y, zcan F, Sismanis Y, et al. Eagle-eyed elephant: Split-oriented indexing in Hadoop[C]// Proceedings of the 16th International Conference on Extending Database Technology. Genoa, Italy: ACM Press, 2013: 89-100.
[9]	Liu Y, Hu S L, Rabl T, et al. DGFIndex for smart grid: Enhancing hive with a cost-effective multidimensional range index[C]// 40th International Conference on VLDB. Hangzhou, China: ACM Press, 2014: 1496-1507.
[10]	宋振伟. 用电信息采集系统数据库的云存储设计[D].山东大学, 2014.
[11]	彭小圣，邓迪元，程时杰，等. 面向智能电网应用的电力大数据关键技术[J]. 中国电机工程学报. 2015, 35(3): 503-511. Peng X S, Deng D Y, Cheng S J, et al. Key technologies of electric power big data and its application prospects in smart grid[J]. Proceedings of the CSEE, 2015, 35(3): 503-511.
[12]	Apache HiveTM[EB/OL]. http://hive.apache.org/.
[13]	Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[C]// Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation. ACM Press, 2004: 137-149.
[14]	Apache Oozie. Apache Oozie workflow scheduler for Hadoop[EB/OL]. http://oozie.apache.org.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Apache Hadoop. Welcome to apache hadoop[EB/OL]. https://hadoop.apache.org/.
[2]	Spark. Lightning-fast cluster computing[EB/OL]. https://spark.apache.org/.
[3]	Zaharia M, Chowdhury M, Franklin M J, et al. Spark: Cluster computing with working sets[C]// Proceedings of the 2nd USENIX Conference on Hot Ttopics in Cloud Computing. Boston, USA: USENIX, 2010: 10-14.
[4]	Xin R S, Rosen J, Zaharia M, et al. Shark: SQL and rich analytics at scale[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2013:13-24.
[5]	Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 922-933.
[6]	Jiang D W, Ooi B C, Shi L, et al. The performance of MapReduce: An in-depth study[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 472-483.
[7]	Dittrich J, Quiané-Ruiz J A, Jindal A, et al. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing) [J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 515-529.
[8]	Eltabakh M Y, zcan F, Sismanis Y, et al. Eagle-eyed elephant: Split-oriented indexing in Hadoop[C]// Proceedings of the 16th International Conference on Extending Database Technology. Genoa, Italy: ACM Press, 2013: 89-100.
[9]	Liu Y, Hu S L, Rabl T, et al. DGFIndex for smart grid: Enhancing hive with a cost-effective multidimensional range index[C]// 40th International Conference on VLDB. Hangzhou, China: ACM Press, 2014: 1496-1507.
[10]	宋振伟. 用电信息采集系统数据库的云存储设计[D].山东大学, 2014.
[11]	彭小圣，邓迪元，程时杰，等. 面向智能电网应用的电力大数据关键技术[J]. 中国电机工程学报. 2015, 35(3): 503-511. Peng X S, Deng D Y, Cheng S J, et al. Key technologies of electric power big data and its application prospects in smart grid[J]. Proceedings of the CSEE, 2015, 35(3): 503-511.
[12]	Apache HiveTM[EB/OL]. http://hive.apache.org/.
[13]	Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[C]// Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation. ACM Press, 2004: 137-149.
[14]	Apache Oozie. Apache Oozie workflow scheduler for Hadoop[EB/OL]. http://oozie.apache.org.)

TrendMD

Volume 46 Issue 1 page: 66-75

Cover

Keywords

Article Metrics

Article views (23) PDF downloads(83)

Spark/Shark-based OLAP system for smart grid applications

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Spark/Shark-based OLAP system for smart grid applications

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content