Tabular-oriented data model and its query issues

HUANG Dongmei; SUN Le; SHI Shaohua; SU Cheng; ZHAO Danfeng

doi:10.3969/j.issn.0253-2778.2016.01.008

PDF( 2540 KB)

Open Access JUSTC Original Paper

Tabular-oriented data model and its query issues

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2016.01.008

Received Date: 27 August 2015
Accepted Date: 29 September 2015
Rev Recd Date: 29 September 2015
Publish Date: 30 January 2016

Abstract Full text PDF

Abstract

Abstract

With the rapid development of information technologies, data storage and representation of various sources, including not only the traditional structured data such as relational databases and object-oriented databases, but also those special unstructured data like Excel, CSV documents, manifest distributed and heterogeneous characteristics. Undoubtedly, all above data features high-volume, continuously-updating, low-usability, which falls into Big Data. However, the organization and management of Excel and other forms of data by using unstructured and semi-structured methods leads to a weakly-controllable, weakly-usable data structure with poor access efficiency. To solve this problem, this paper, taking Excel data source into consideration, aims to propose a new tabular-oriented relational data model and discusses Tabular querying and optimizing issues. Firstly, the formal definition of Tabular form data is given; secondly, PartiPath tree is designed to achieve structural transformation by tabular division and its relation schema as well; then its data model is presented. After that, four basic queries and their optimization by improved DICE with user interest similarity are described. Finally, the experiment was conducted and a conclusion was drawm.

Abstract

With the rapid development of information technologies, data storage and representation of various sources, including not only the traditional structured data such as relational databases and object-oriented databases, but also those special unstructured data like Excel, CSV documents, manifest distributed and heterogeneous characteristics. Undoubtedly, all above data features high-volume, continuously-updating, low-usability, which falls into Big Data. However, the organization and management of Excel and other forms of data by using unstructured and semi-structured methods leads to a weakly-controllable, weakly-usable data structure with poor access efficiency. To solve this problem, this paper, taking Excel data source into consideration, aims to propose a new tabular-oriented relational data model and discusses Tabular querying and optimizing issues. Firstly, the formal definition of Tabular form data is given; secondly, PartiPath tree is designed to achieve structural transformation by tabular division and its relation schema as well; then its data model is presented. After that, four basic queries and their optimization by improved DICE with user interest similarity are described. Finally, the experiment was conducted and a conclusion was drawm.

FullText(HTML)

References(18)

References

[1]	Mayer-Schnberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think[M]. Boston: Houghton Mifflin Harcourt, 2013.
[2]	China Argo News Letter, 2014, NO.2.
[3]	Argo data center in China, http://www.argo.org.cn/
[4]	Chui M, Brown B, Bughin J, et al. Big data: The next frontier for innovation, competition, and productivity[R]. McKinsey Global Institute, 2011.
[5]	Hurst M. Layout and language: Challenges for table understanding on the web[EB/OL]. http://cgi.csc.liv.ac.uk/~wda2001/Papers/12_hurst_wda2001.pdf.
[6]	Embley D W, Tao C, Liddle S W. Automating the extraction of data from HTML tables with unknown structure[J]. Data & Knowledge Engineering, 2005, 54(1): 3-28.
[7]	Douglas S, Hurst M, Quinn D, et al. Using natural language processing for identifying and interpreting tables in plain text[J]. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1997, 21(2-4): 231-243.
[8]	Gatterbauer W, Bohunsky P, Herzog M, et al. Towards domain-independent information extraction from web tables[C]// Proceedings of the16th International Conference on World Wide Web. Banff, Canada: ACM Press, 2007: 71-80.
[9]	Ferrucci D, Lally A. UIMA: An architectural approach to unstructured information processing in the corporate research environment[J]. Natural Language Engineering, 2004, 10(3-4): 327-348.
[10]	Pivk A, Sure Y, Cimiano P, et al. Transforming arbitrary tables into logical form with tartar[J]. Data & Knowledge Engineering, 2007, 60(3): 567-595.
[11]	Pinto D, McCallum A, Wei X, et al. Table extraction using conditional random fields[C]// Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. Tprpnto, Canada: ACM Press, 2003: 235-242.
[12]	Duygulu P, Atalay V. A hierarchical representation of form documents for identification and retrieval[J]. International Journal on Document Analysis and Recognition, 1995, 5(1): 17-27.
[13]	Tijerino Y A, Embley D W, Lonsdale D W, et al. Towards ontology generation from tables[J]. World Wide Web: Internet and Web Information Systems, 2005, 8(3): 261-285.
[14]	Shigarov A O. Table understanding using a rule engine[J]. Expert Systems with Applications, 2015, 42(2): 929-937.
[15]	Lopresti D, Nagy G. A tabular survey of automated table processing[C]// Graphics Recognition Recent Advances. Springer, 2000: 93-120.
[16]	Wang X. Tabular abstraction, editing, and formatting[D]. University of Waterloo, Canada, 1996.
[17]	Kumar T V V, Goel A, Jain N. Mining information for constructing materialised views[J]. International Journal of Information and Communication Technology, 2010, 2(4): 386-405.
[18]	Frakes W B, Baeza-Yates R. Information Retrieval: Data Structure and Algorithms[M]. Upper Saddle River, USA: Prentice-Hall, 1992.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Mayer-Schnberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think[M]. Boston: Houghton Mifflin Harcourt, 2013.
[2]	China Argo News Letter, 2014, NO.2.
[3]	Argo data center in China, http://www.argo.org.cn/
[4]	Chui M, Brown B, Bughin J, et al. Big data: The next frontier for innovation, competition, and productivity[R]. McKinsey Global Institute, 2011.
[5]	Hurst M. Layout and language: Challenges for table understanding on the web[EB/OL]. http://cgi.csc.liv.ac.uk/~wda2001/Papers/12_hurst_wda2001.pdf.
[6]	Embley D W, Tao C, Liddle S W. Automating the extraction of data from HTML tables with unknown structure[J]. Data & Knowledge Engineering, 2005, 54(1): 3-28.
[7]	Douglas S, Hurst M, Quinn D, et al. Using natural language processing for identifying and interpreting tables in plain text[J]. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1997, 21(2-4): 231-243.
[8]	Gatterbauer W, Bohunsky P, Herzog M, et al. Towards domain-independent information extraction from web tables[C]// Proceedings of the16th International Conference on World Wide Web. Banff, Canada: ACM Press, 2007: 71-80.
[9]	Ferrucci D, Lally A. UIMA: An architectural approach to unstructured information processing in the corporate research environment[J]. Natural Language Engineering, 2004, 10(3-4): 327-348.
[10]	Pivk A, Sure Y, Cimiano P, et al. Transforming arbitrary tables into logical form with tartar[J]. Data & Knowledge Engineering, 2007, 60(3): 567-595.
[11]	Pinto D, McCallum A, Wei X, et al. Table extraction using conditional random fields[C]// Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. Tprpnto, Canada: ACM Press, 2003: 235-242.
[12]	Duygulu P, Atalay V. A hierarchical representation of form documents for identification and retrieval[J]. International Journal on Document Analysis and Recognition, 1995, 5(1): 17-27.
[13]	Tijerino Y A, Embley D W, Lonsdale D W, et al. Towards ontology generation from tables[J]. World Wide Web: Internet and Web Information Systems, 2005, 8(3): 261-285.
[14]	Shigarov A O. Table understanding using a rule engine[J]. Expert Systems with Applications, 2015, 42(2): 929-937.
[15]	Lopresti D, Nagy G. A tabular survey of automated table processing[C]// Graphics Recognition Recent Advances. Springer, 2000: 93-120.
[16]	Wang X. Tabular abstraction, editing, and formatting[D]. University of Waterloo, Canada, 1996.
[17]	Kumar T V V, Goel A, Jain N. Mining information for constructing materialised views[J]. International Journal of Information and Communication Technology, 2010, 2(4): 386-405.
[18]	Frakes W B, Baeza-Yates R. Information Retrieval: Data Structure and Algorithms[M]. Upper Saddle River, USA: Prentice-Hall, 1992.)

TrendMD

Volume 46 Issue 1 page: 56-65

Cover

Keywords

Article Metrics

Article views (24) PDF downloads(79)

Tabular-oriented data model and its query issues

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Tabular-oriented data model and its query issues

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content