ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

A fast DFA construction algorithm by subset encoding

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2014.01.001
  • Received Date: 26 March 2013
  • Accepted Date: 23 April 2013
  • Rev Recd Date: 23 April 2013
  • Publish Date: 30 January 2014
  • Regular expression matching is the foundation of many network functions such as deep packet inspection, which is performed using either non-deterministic finite automaton (NFA) or deterministic finite automation (DFA). To meet the requirement of high-speed regular expression matching, DFA has been the prevalent choice for its deterministic nature and matching efficiency. However, all these DFA-based approaches need to construct a DFA from an NFA as an intermediate step, thus the DFA construction process can be one of bottlenecks for the system. By exploring the inherent properties of finite automaton — whether the NFA states simultaneously active and how the self-looping NFA states lead to explosion of DFA state space — a state subset encoding and searching scheme was designed, and a new DFA construction algorithm was proposed. Through experiments based on real life pattern sets from the Snort intrusion detection system, the new DFA construction algorithm was demonstrated to reduce the running time by 8833%~9357% compared with that of the standard subset construction algorithm.
    Regular expression matching is the foundation of many network functions such as deep packet inspection, which is performed using either non-deterministic finite automaton (NFA) or deterministic finite automation (DFA). To meet the requirement of high-speed regular expression matching, DFA has been the prevalent choice for its deterministic nature and matching efficiency. However, all these DFA-based approaches need to construct a DFA from an NFA as an intermediate step, thus the DFA construction process can be one of bottlenecks for the system. By exploring the inherent properties of finite automaton — whether the NFA states simultaneously active and how the self-looping NFA states lead to explosion of DFA state space — a state subset encoding and searching scheme was designed, and a new DFA construction algorithm was proposed. Through experiments based on real life pattern sets from the Snort intrusion detection system, the new DFA construction algorithm was demonstrated to reduce the running time by 8833%~9357% compared with that of the standard subset construction algorithm.
  • loading
  • [1]
    Hopcroft J E, Motwani R, Ullman J D. Introduction to Automata Theory, Languages and Computation [M]. 3ed, Addison-Wesley, 2006.
    [2]
    Aha A V, Corasick M J. Efficient String matching: An aid to bibliographic search[J]. Communications of the ACM, 1975, 18(6): 333-340.
    [3]
    Boyer R S, Moore J S. A fast string searching algorithm[J]. Communications of the ACM, 1977, 20(10): 762-772.
    [4]
    Alicherry M, Muthuprasanna M, Kumar V. High speed pattern matching for network IDS/IPS [C]// Proceedings of the 14th International Conference on Network Protocols. Santa Barbara, USA: IEEE Press, 2006: 187-196.
    [5]
    Yu F, Katz R H, Lakshman T V. Gigabit rate packet pattern-matching using TCAM[C]// Proceedings of the 12th International Conference on Network Protocols. Berlin, Germany: IEEE Press, 2004: 174-183.
    [6]
    Sung J S, Kang S M, Lee Y, et al. A multi-gigabit rate deep packet inspection algorithm using TCAM [C]// Proceeding of IEEE Gliobal Telicommunications Conference. St. Louis, USA: IEEE Press, 2005: 1-5.
    [7]
    Zu Y, Yang M, Xu Z H, et al. GPU-based NFA implementation for memory efficient high speed regular expression matching [C]// Proceedings of 17th ACM SIGPLAN on Principles and Practice of Parallel Programming. New Orleans, USA: ACM Press, 2012: 129-140.
    [8]
    Peng K Y, Dong Q F. TCAM-based NFA implementation [C]// Proceedings of the 12th ACM SIGMETRICS/Performance joint International Conference on Measurement and Modeling of Computer Systems. London, UK: ACM Press, 2012: 379-380.
    [9]
    Yu F, Chen Z F, Diao Y L, et al. Fast and memory efficient regular expression matching for deep packet inspection [C]// Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA: ACM Press, 2006: 93-102.
    [10]
    Kumar S, Dharmapurikar D, Yu F, et al. Algorithms to accelerate multiple regular expressions matching for deep packet inspection[C]// Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. Pisa, Italy: ACM Press, 2006: 339-350.
    [11]
    Becchi M, Crowley P. An improved algorithm to accelerate regular expression evaluation [C]// Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems. Orlando, USA: ACM Press, 2007: 145-154.
    [12]
    Chen M. TCAM-based high speed regular expression matching[D]. University of Science and Technology of China, Hefei, China, 2010.
    [13]
    Meiners C R, Patel J, Norige E, et al. Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems[C]// Proceedings of the 19th USENIX conference on Security. Washington, USA: USENIX Association, 2010: 8.
    [14]
    Peng K Y, Dong Q F, Chen M. TCAM-based DFA deflation: A novel approach to fast and scalable regular expression matching [C]// Proceedings of IEEE International Workshop on Quality of Service. San Jose, USA: IEEE Press, 2011: 1-3.
    [15]
    Peng K Y, Tang S, Chen M, etal. Chain-based DFA deflation for fast and scalable regular expression matching using TCAM[C]// Proceedings of the 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems. Brooklyn, USA: ACM Press, 2011: 24-35.
    [16]
    Leslie T. Efficient approaches to subset construction [D]. University of Waterloo, Ontario, Canada, 1995.
    [17]
    Becchi M. Regular expression processor[EB/OL]. http://regex.wustl.edu/.
    [18]
    Becchi M and Crowley P. Efficient regular expression evaluation: Theory to practice[C]// Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA: ACM Press, 2008: 50-59.
    [19]
    Becchi M, Franklin M A, Crowley P. A workload for evaluating deep packet inspection architectures [C]// Proceedings of International Symposium on Workload Characterization. Seattle, USA: IEEE Press, 2008: 79-89.
    [20]
    Smith R, Estan C, Jha S. XFA: Faster signature matching with extended automata [C]// Proceedings of the IEEE Symposium on Security and Privacy. Oakland, USA: IEEE Press, 2008:187-201.
    [21]
    Smith R, Estan C, Jha S, et al. Deflating the big bang: Fast and scalable deep packet inspection with extended finite automata [C]// Proceedings of the ACM SIGCOMM Conference on Data Communications. Seattle, USA: ACM Press, 2008: 207-218.
    [22]
    Snort: A free and open source network intrusion detection system (NIDS) and network intrusion prevention system (NIPS) [EB/OL]. http://www.snort.org/.
  • 加载中

Catalog

    [1]
    Hopcroft J E, Motwani R, Ullman J D. Introduction to Automata Theory, Languages and Computation [M]. 3ed, Addison-Wesley, 2006.
    [2]
    Aha A V, Corasick M J. Efficient String matching: An aid to bibliographic search[J]. Communications of the ACM, 1975, 18(6): 333-340.
    [3]
    Boyer R S, Moore J S. A fast string searching algorithm[J]. Communications of the ACM, 1977, 20(10): 762-772.
    [4]
    Alicherry M, Muthuprasanna M, Kumar V. High speed pattern matching for network IDS/IPS [C]// Proceedings of the 14th International Conference on Network Protocols. Santa Barbara, USA: IEEE Press, 2006: 187-196.
    [5]
    Yu F, Katz R H, Lakshman T V. Gigabit rate packet pattern-matching using TCAM[C]// Proceedings of the 12th International Conference on Network Protocols. Berlin, Germany: IEEE Press, 2004: 174-183.
    [6]
    Sung J S, Kang S M, Lee Y, et al. A multi-gigabit rate deep packet inspection algorithm using TCAM [C]// Proceeding of IEEE Gliobal Telicommunications Conference. St. Louis, USA: IEEE Press, 2005: 1-5.
    [7]
    Zu Y, Yang M, Xu Z H, et al. GPU-based NFA implementation for memory efficient high speed regular expression matching [C]// Proceedings of 17th ACM SIGPLAN on Principles and Practice of Parallel Programming. New Orleans, USA: ACM Press, 2012: 129-140.
    [8]
    Peng K Y, Dong Q F. TCAM-based NFA implementation [C]// Proceedings of the 12th ACM SIGMETRICS/Performance joint International Conference on Measurement and Modeling of Computer Systems. London, UK: ACM Press, 2012: 379-380.
    [9]
    Yu F, Chen Z F, Diao Y L, et al. Fast and memory efficient regular expression matching for deep packet inspection [C]// Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA: ACM Press, 2006: 93-102.
    [10]
    Kumar S, Dharmapurikar D, Yu F, et al. Algorithms to accelerate multiple regular expressions matching for deep packet inspection[C]// Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. Pisa, Italy: ACM Press, 2006: 339-350.
    [11]
    Becchi M, Crowley P. An improved algorithm to accelerate regular expression evaluation [C]// Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems. Orlando, USA: ACM Press, 2007: 145-154.
    [12]
    Chen M. TCAM-based high speed regular expression matching[D]. University of Science and Technology of China, Hefei, China, 2010.
    [13]
    Meiners C R, Patel J, Norige E, et al. Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems[C]// Proceedings of the 19th USENIX conference on Security. Washington, USA: USENIX Association, 2010: 8.
    [14]
    Peng K Y, Dong Q F, Chen M. TCAM-based DFA deflation: A novel approach to fast and scalable regular expression matching [C]// Proceedings of IEEE International Workshop on Quality of Service. San Jose, USA: IEEE Press, 2011: 1-3.
    [15]
    Peng K Y, Tang S, Chen M, etal. Chain-based DFA deflation for fast and scalable regular expression matching using TCAM[C]// Proceedings of the 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems. Brooklyn, USA: ACM Press, 2011: 24-35.
    [16]
    Leslie T. Efficient approaches to subset construction [D]. University of Waterloo, Ontario, Canada, 1995.
    [17]
    Becchi M. Regular expression processor[EB/OL]. http://regex.wustl.edu/.
    [18]
    Becchi M and Crowley P. Efficient regular expression evaluation: Theory to practice[C]// Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA: ACM Press, 2008: 50-59.
    [19]
    Becchi M, Franklin M A, Crowley P. A workload for evaluating deep packet inspection architectures [C]// Proceedings of International Symposium on Workload Characterization. Seattle, USA: IEEE Press, 2008: 79-89.
    [20]
    Smith R, Estan C, Jha S. XFA: Faster signature matching with extended automata [C]// Proceedings of the IEEE Symposium on Security and Privacy. Oakland, USA: IEEE Press, 2008:187-201.
    [21]
    Smith R, Estan C, Jha S, et al. Deflating the big bang: Fast and scalable deep packet inspection with extended finite automata [C]// Proceedings of the ACM SIGCOMM Conference on Data Communications. Seattle, USA: ACM Press, 2008: 207-218.
    [22]
    Snort: A free and open source network intrusion detection system (NIDS) and network intrusion prevention system (NIPS) [EB/OL]. http://www.snort.org/.

    Article Metrics

    Article views (49) PDF downloads(69)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return