A malicious domain name detection method based on CNN

DU Shuying; DU Peng; DING Shifei

doi:10.3969/j.issn.0253-2778.2020.07.020

PDF( 1384 KB)

Open Access JUSTC Original Paper

A malicious domain name detection method based on CNN

1.
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
2.
School of information management , Xuzhou Vocational College of Bioengineering, Xuzhou 221000, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2020.07.020

Received Date: 03 June 2020
Accepted Date: 21 June 2020
Rev Recd Date: 21 June 2020
Publish Date: 31 July 2020

Abstract Full text PDF

Abstract

Abstract

In recent years, various cyber attacks based on botnets have been one of the cyber security threats. Various malwares use the Domain Generation Algorithm (DGA) to automatically generate a large number of pseudo-random domain names to connect to commands and control servers. The detection and classification of pseudo-random domain names based on the convolutional neural network (CNN) method is focused on. A brief introduction is given to the hazards, basic principles of botnets, and the role of fake domain names in botnets. After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network. The basic concept of convolutional neural network is expounded by simple neural network training experiments. The differences of the model's effect on solving classification problems under different hyperparameters and different excitation functions are simulated. In the analysis of the model operation results, the accuracy and loss function of the domain name identification by the convolutional neural network model are given, and the evaluation indexes of the accuracy, recall, F1 and ROC curves are printed out. All indicators show that the classification of the model is good. It is concluded that counterfeit domain name recognition based on CNN is a reliable method.

Abstract

In recent years, various cyber attacks based on botnets have been one of the cyber security threats. Various malwares use the Domain Generation Algorithm (DGA) to automatically generate a large number of pseudo-random domain names to connect to commands and control servers. The detection and classification of pseudo-random domain names based on the convolutional neural network (CNN) method is focused on. A brief introduction is given to the hazards, basic principles of botnets, and the role of fake domain names in botnets. After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network. The basic concept of convolutional neural network is expounded by simple neural network training experiments. The differences of the model's effect on solving classification problems under different hyperparameters and different excitation functions are simulated. In the analysis of the model operation results, the accuracy and loss function of the domain name identification by the convolutional neural network model are given, and the evaluation indexes of the accuracy, recall, F1 and ROC curves are printed out. All indicators show that the classification of the model is good. It is concluded that counterfeit domain name recognition based on CNN is a reliable method.

FullText(HTML)

References(24)

References

[1]	诸葛建伟, 韩心慧, 周勇林, 等. 僵尸网络研究[J]. 软件学报, 2008, 19(3):702-715.
[2]	江健, 诸葛建伟, 段海新, 等. 僵尸网络机理与防御技术[J]. 软件学报, 2012, 23(1):82-96.
[3]	杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2):433-446.
[4]	ANTONAKAKIS M, PERDISCI R, LEE W, et al. Detecting malware domains at the upper DNS hierarchy [C] // Proceedings of the 20th USENIX Security Symp (Security’11). Berkeley, CA: USENIX Association, 2011: 1-16.
[5]	WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks [J].2016, arXiv：1611.00791.
[6]	YU B, SMITH L, THREEFOOT M. Semi-supervised time series modeling for real-time flux domain detection on passive DNS traffic [C] // Proceedings of the 10th International Conference on Machine Learning and Data Mining, 2014:258-271.
[7]	GEFFNER J. End-to-end analysis of a domain generating algorithm malware family [C/OL] //Black Hat USA 2013. [2020-04-17]. https://media.blackhat.com/us-13/US-13-Geffner-End-To-End-Analysis-of-a-Domain-Generating-Algorithm-Malware-Family-WP.pdf.
[8]	STONE-GROSS B, COVA M, CAVALLARO L, et al. Your botnet is my botnet: Analysis of a botnet takeover [C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 635-647.
[9]	YU B, SMITH L, THREEFOOT M, et al. Behavior analysis based DNS tunneling detection with big data technologies [C] // Proceedings of the International Conference on Internet of Things and Big Data, 2016:284-290.
[10]	MAC H, TRAN D, TONG V, et al. DGA Botnet Detection Using Supervised Learning Methods [C] //Proceedings of the Eighth International Symposium on Information and Communication Technology. ACM, 2017: 211-218.
[11]	VINAYAKUMAR R, SOMAN K P, POORNACHANDRAN P, et al. Evaluating deep learning approaches to characterize and classify the DGAs at scale [J]. Journal of Intelligent & Fuzzy Systems, 2018, 34(3):1265-1276.
[12]	ANTONAKAKIS M, PERDISCI R, DAGON D, et al. Building a dynamic reputation system for dns [C] // Proceedings of the 19th USENIX Security Symp (Security’10). Berkeley, CA: USENIX Association, 2010: 273-290.
[13]	SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA based botnet tracking and intelligence, in Detection of Intrusions and Malware, and Vulnerability Assessment [J]. Springer, 2014:192-211.
[14]	ANTONAKAKIS M, PERDISCI R, NADJI Y, et al. From throw-away traffic to bots: Detecting the rise of DGA-based malware[C] // Proceedings of the 21st USENIX Security Symp (Security’12). Berkeley, CA: USENIX Association, 2012: 491-506.
[15]	KRISHNAN S, TAYLOR T, MONROSE F, et al. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing [C] //in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 2013:1-12.
[16]	BARUCH M, DAVID G. Domain Generation Algorithm Detection Using Machine Learning Methods [M]. Cyber Security: Power and Technology. Springer, Cham, 2018: 133-161.
[17]	丁世飞. 人工智能[M]. 2版. 北京：清华大学出版社，2015.
[18]	GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. MIT Press, 2016.
[19]	ABADI M, BARHAM P, CHEN J, et al. TensorFlow: A system for large-scale machine learning [C] //Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, Georgia, USA, 2016.
[20]	WANG Q, ZHANG Y, LI P, et al. Cross-domain sentiment classification based on word2vec [J]. Application Research of Computers, 2018, 35(10):2924-2927.
[21]	KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks [C] // Proceedings of the International Conference on Learning Representations. San Juan, Puerto Rico: IEEE, 2016, arXiv:1506.02078.
[22]	LE Q, JAITLY N, HINTON G. A simple way to initialize recurrent networks of rectified linear units [J]. Computer Ence, 2015.
[23]	KIM Y. Convolutional neural networks for sentence classification [J]. 2014, arXiv:1504.00941.
[24]	Does Alexa have a list of its top-ranked websites? [EB/OL]. [2017-04-02]. https:// support.alexa.com/. )

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	诸葛建伟, 韩心慧, 周勇林, 等. 僵尸网络研究[J]. 软件学报, 2008, 19(3):702-715.
[2]	江健, 诸葛建伟, 段海新, 等. 僵尸网络机理与防御技术[J]. 软件学报, 2012, 23(1):82-96.
[3]	杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2):433-446.
[4]	ANTONAKAKIS M, PERDISCI R, LEE W, et al. Detecting malware domains at the upper DNS hierarchy [C] // Proceedings of the 20th USENIX Security Symp (Security’11). Berkeley, CA: USENIX Association, 2011: 1-16.
[5]	WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks [J].2016, arXiv：1611.00791.
[6]	YU B, SMITH L, THREEFOOT M. Semi-supervised time series modeling for real-time flux domain detection on passive DNS traffic [C] // Proceedings of the 10th International Conference on Machine Learning and Data Mining, 2014:258-271.
[7]	GEFFNER J. End-to-end analysis of a domain generating algorithm malware family [C/OL] //Black Hat USA 2013. [2020-04-17]. https://media.blackhat.com/us-13/US-13-Geffner-End-To-End-Analysis-of-a-Domain-Generating-Algorithm-Malware-Family-WP.pdf.
[8]	STONE-GROSS B, COVA M, CAVALLARO L, et al. Your botnet is my botnet: Analysis of a botnet takeover [C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 635-647.
[9]	YU B, SMITH L, THREEFOOT M, et al. Behavior analysis based DNS tunneling detection with big data technologies [C] // Proceedings of the International Conference on Internet of Things and Big Data, 2016:284-290.
[10]	MAC H, TRAN D, TONG V, et al. DGA Botnet Detection Using Supervised Learning Methods [C] //Proceedings of the Eighth International Symposium on Information and Communication Technology. ACM, 2017: 211-218.
[11]	VINAYAKUMAR R, SOMAN K P, POORNACHANDRAN P, et al. Evaluating deep learning approaches to characterize and classify the DGAs at scale [J]. Journal of Intelligent & Fuzzy Systems, 2018, 34(3):1265-1276.
[12]	ANTONAKAKIS M, PERDISCI R, DAGON D, et al. Building a dynamic reputation system for dns [C] // Proceedings of the 19th USENIX Security Symp (Security’10). Berkeley, CA: USENIX Association, 2010: 273-290.
[13]	SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA based botnet tracking and intelligence, in Detection of Intrusions and Malware, and Vulnerability Assessment [J]. Springer, 2014:192-211.
[14]	ANTONAKAKIS M, PERDISCI R, NADJI Y, et al. From throw-away traffic to bots: Detecting the rise of DGA-based malware[C] // Proceedings of the 21st USENIX Security Symp (Security’12). Berkeley, CA: USENIX Association, 2012: 491-506.
[15]	KRISHNAN S, TAYLOR T, MONROSE F, et al. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing [C] //in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, 2013:1-12.
[16]	BARUCH M, DAVID G. Domain Generation Algorithm Detection Using Machine Learning Methods [M]. Cyber Security: Power and Technology. Springer, Cham, 2018: 133-161.
[17]	丁世飞. 人工智能[M]. 2版. 北京：清华大学出版社，2015.
[18]	GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. MIT Press, 2016.
[19]	ABADI M, BARHAM P, CHEN J, et al. TensorFlow: A system for large-scale machine learning [C] //Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, Georgia, USA, 2016.
[20]	WANG Q, ZHANG Y, LI P, et al. Cross-domain sentiment classification based on word2vec [J]. Application Research of Computers, 2018, 35(10):2924-2927.
[21]	KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks [C] // Proceedings of the International Conference on Learning Representations. San Juan, Puerto Rico: IEEE, 2016, arXiv:1506.02078.
[22]	LE Q, JAITLY N, HINTON G. A simple way to initialize recurrent networks of rectified linear units [J]. Computer Ence, 2015.
[23]	KIM Y. Convolutional neural networks for sentence classification [J]. 2014, arXiv:1504.00941.
[24]	Does Alexa have a list of its top-ranked websites? [EB/OL]. [2017-04-02]. https:// support.alexa.com/. )

TrendMD

Volume 50 Issue 7 page: 1019-1025

Cover

Keywords

Article Metrics

Article views (83) PDF downloads(131)

A malicious domain name detection method based on CNN

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

A malicious domain name detection method based on CNN

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content