ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

A malicious domain name detection method based on CNN

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2020.07.020
  • Received Date: 03 June 2020
  • Accepted Date: 21 June 2020
  • Rev Recd Date: 21 June 2020
  • Publish Date: 31 July 2020
  • In recent years, various cyber attacks based on botnets have been one of the cyber security threats. Various malwares use the Domain Generation Algorithm (DGA) to automatically generate a large number of pseudo-random domain names to connect to commands and control servers. The detection and classification of pseudo-random domain names based on the convolutional neural network (CNN) method is focused on. A brief introduction is given to the hazards, basic principles of botnets, and the role of fake domain names in botnets. After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network. The basic concept of convolutional neural network is expounded by simple neural network training experiments. The differences of the model's effect on solving classification problems under different hyperparameters and different excitation functions are simulated. In the analysis of the model operation results, the accuracy and loss function of the domain name identification by the convolutional neural network model are given, and the evaluation indexes of the accuracy, recall, F1 and ROC curves are printed out. All indicators show that the classification of the model is good. It is concluded that counterfeit domain name recognition based on CNN is a reliable method.
    In recent years, various cyber attacks based on botnets have been one of the cyber security threats. Various malwares use the Domain Generation Algorithm (DGA) to automatically generate a large number of pseudo-random domain names to connect to commands and control servers. The detection and classification of pseudo-random domain names based on the convolutional neural network (CNN) method is focused on. A brief introduction is given to the hazards, basic principles of botnets, and the role of fake domain names in botnets. After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network. The basic concept of convolutional neural network is expounded by simple neural network training experiments. The differences of the model's effect on solving classification problems under different hyperparameters and different excitation functions are simulated. In the analysis of the model operation results, the accuracy and loss function of the domain name identification by the convolutional neural network model are given, and the evaluation indexes of the accuracy, recall, F1 and ROC curves are printed out. All indicators show that the classification of the model is good. It is concluded that counterfeit domain name recognition based on CNN is a reliable method.
  • loading
  • [1]
    诸葛建伟, 韩心慧, 周勇林, 等. 僵尸网络研究[J]. 软件学报, 2008, 19(3):702-715.
    [2]
    江健, 诸葛建伟, 段海新, 等. 僵尸网络机理与防御技术[J]. 软件学报, 2012, 23(1):82-96.
    [3]
    杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2):433-446.
    [4]
    ANTONAKAKIS M, PERDISCI R, LEE W, et al. Detecting malware domains at the upper DNS hierarchy [C] // Proceedings of the 20th USENIX Security Symp (Security’11). Berkeley, CA: USENIX Association, 2011: 1-16.
    [5]
    WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks [J].2016, arXiv:1611.00791.
    [6]
    YU B, SMITH L, THREEFOOT M. Semi-supervised time series modeling for real-time flux domain detection on passive DNS traffic [C] // Proceedings of the 10th International Conference on Machine Learning and Data Mining, 2014:258-271.
    [7]
    GEFFNER J. End-to-end analysis of a domain generating algorithm malware family [C/OL] //Black Hat USA 2013. [2020-04-17]. https://media.blackhat.com/us-13/US-13-Geffner-End-To-End-Analysis-of-a-Domain-Generating-Algorithm-Malware-Family-WP.pdf.
    [8]
    STONE-GROSS B, COVA M, CAVALLARO L, et al. Your botnet is my botnet: Analysis of a botnet takeover [C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 635-647.
    [9]
    YU B, SMITH L, THREEFOOT M, et al. Behavior analysis based DNS tunneling detection with big data technologies [C] // Proceedings of the International Conference on Internet of Things and Big Data, 2016:284-290.
    [10]
    MAC H, TRAN D, TONG V, et al. DGA Botnet Detection Using Supervised Learning Methods [C] //Proceedings of the Eighth International Symposium on Information and Communication Technology. ACM, 2017: 211-218.
    [11]
    VINAYAKUMAR R, SOMAN K P, POORNACHANDRAN P, et al. Evaluating deep learning approaches to characterize and classify the DGAs at scale [J]. Journal of Intelligent & Fuzzy Systems, 2018, 34(3):1265-1276.
    [12]
    ANTONAKAKIS M, PERDISCI R, DAGON D, et al. Building a dynamic reputation system for dns [C] // Proceedings of the 19th USENIX Security Symp (Security’10). Berkeley, CA: USENIX Association, 2010: 273-290.
    [13]
    SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA based botnet tracking and intelligence, in Detection of Intrusions and Malware, and Vulnerability Assessment [J]. Springer, 2014:192-211.
    [14]
    ANTONAKAKIS M, PERDISCI R, NADJI Y, et al. From throw-away traffic to bots: Detecting the rise of DGA-based malware[C] // Proceedings of the 21st USENIX Security Symp (Security’12). Berkeley, CA: USENIX Association, 2012: 491-506.
    [15]
    KRISHNAN S, TAYLOR T, MONROSE F, et al. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing [C] //in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 2013:1-12.
    [16]
    BARUCH M, DAVID G. Domain Generation Algorithm Detection Using Machine Learning Methods [M]. Cyber Security: Power and Technology. Springer, Cham, 2018: 133-161.
    [17]
    丁世飞. 人工智能[M]. 2版. 北京:清华大学出版社,2015.
    [18]
    GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. MIT Press, 2016.
    [19]
    ABADI M, BARHAM P, CHEN J, et al. TensorFlow: A system for large-scale machine learning [C] //Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, Georgia, USA, 2016.
    [20]
    WANG Q, ZHANG Y, LI P, et al. Cross-domain sentiment classification based on word2vec [J]. Application Research of Computers, 2018, 35(10):2924-2927.
    [21]
    KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks [C] // Proceedings of the International Conference on Learning Representations. San Juan, Puerto Rico: IEEE, 2016, arXiv:1506.02078.
    [22]
    LE Q, JAITLY N, HINTON G. A simple way to initialize recurrent networks of rectified linear units [J]. Computer Ence, 2015.
    [23]
    KIM Y. Convolutional neural networks for sentence classification [J]. 2014, arXiv:1504.00941.
    [24]
    Does Alexa have a list of its top-ranked websites? [EB/OL]. [2017-04-02]. https:// support.alexa.com/. )
  • 加载中

Catalog

    [1]
    诸葛建伟, 韩心慧, 周勇林, 等. 僵尸网络研究[J]. 软件学报, 2008, 19(3):702-715.
    [2]
    江健, 诸葛建伟, 段海新, 等. 僵尸网络机理与防御技术[J]. 软件学报, 2012, 23(1):82-96.
    [3]
    杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2):433-446.
    [4]
    ANTONAKAKIS M, PERDISCI R, LEE W, et al. Detecting malware domains at the upper DNS hierarchy [C] // Proceedings of the 20th USENIX Security Symp (Security’11). Berkeley, CA: USENIX Association, 2011: 1-16.
    [5]
    WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks [J].2016, arXiv:1611.00791.
    [6]
    YU B, SMITH L, THREEFOOT M. Semi-supervised time series modeling for real-time flux domain detection on passive DNS traffic [C] // Proceedings of the 10th International Conference on Machine Learning and Data Mining, 2014:258-271.
    [7]
    GEFFNER J. End-to-end analysis of a domain generating algorithm malware family [C/OL] //Black Hat USA 2013. [2020-04-17]. https://media.blackhat.com/us-13/US-13-Geffner-End-To-End-Analysis-of-a-Domain-Generating-Algorithm-Malware-Family-WP.pdf.
    [8]
    STONE-GROSS B, COVA M, CAVALLARO L, et al. Your botnet is my botnet: Analysis of a botnet takeover [C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 635-647.
    [9]
    YU B, SMITH L, THREEFOOT M, et al. Behavior analysis based DNS tunneling detection with big data technologies [C] // Proceedings of the International Conference on Internet of Things and Big Data, 2016:284-290.
    [10]
    MAC H, TRAN D, TONG V, et al. DGA Botnet Detection Using Supervised Learning Methods [C] //Proceedings of the Eighth International Symposium on Information and Communication Technology. ACM, 2017: 211-218.
    [11]
    VINAYAKUMAR R, SOMAN K P, POORNACHANDRAN P, et al. Evaluating deep learning approaches to characterize and classify the DGAs at scale [J]. Journal of Intelligent & Fuzzy Systems, 2018, 34(3):1265-1276.
    [12]
    ANTONAKAKIS M, PERDISCI R, DAGON D, et al. Building a dynamic reputation system for dns [C] // Proceedings of the 19th USENIX Security Symp (Security’10). Berkeley, CA: USENIX Association, 2010: 273-290.
    [13]
    SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA based botnet tracking and intelligence, in Detection of Intrusions and Malware, and Vulnerability Assessment [J]. Springer, 2014:192-211.
    [14]
    ANTONAKAKIS M, PERDISCI R, NADJI Y, et al. From throw-away traffic to bots: Detecting the rise of DGA-based malware[C] // Proceedings of the 21st USENIX Security Symp (Security’12). Berkeley, CA: USENIX Association, 2012: 491-506.
    [15]
    KRISHNAN S, TAYLOR T, MONROSE F, et al. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing [C] //in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 2013:1-12.
    [16]
    BARUCH M, DAVID G. Domain Generation Algorithm Detection Using Machine Learning Methods [M]. Cyber Security: Power and Technology. Springer, Cham, 2018: 133-161.
    [17]
    丁世飞. 人工智能[M]. 2版. 北京:清华大学出版社,2015.
    [18]
    GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. MIT Press, 2016.
    [19]
    ABADI M, BARHAM P, CHEN J, et al. TensorFlow: A system for large-scale machine learning [C] //Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, Georgia, USA, 2016.
    [20]
    WANG Q, ZHANG Y, LI P, et al. Cross-domain sentiment classification based on word2vec [J]. Application Research of Computers, 2018, 35(10):2924-2927.
    [21]
    KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks [C] // Proceedings of the International Conference on Learning Representations. San Juan, Puerto Rico: IEEE, 2016, arXiv:1506.02078.
    [22]
    LE Q, JAITLY N, HINTON G. A simple way to initialize recurrent networks of rectified linear units [J]. Computer Ence, 2015.
    [23]
    KIM Y. Convolutional neural networks for sentence classification [J]. 2014, arXiv:1504.00941.
    [24]
    Does Alexa have a list of its top-ranked websites? [EB/OL]. [2017-04-02]. https:// support.alexa.com/. )

    Article Metrics

    Article views (86) PDF downloads(141)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return