Service identification of WeChat traffic based on fuzziness and semi-supervised self-paced co-training

LIU Weikang; QIN Xiaowei; WEI Guo

doi:10.3969/j.issn.0253-2778.2020.01.004

PDF( 1394 KB)

Open Access JUSTC Research Article

Service identification of WeChat traffic based on fuzziness and semi-supervised self-paced co-training

CAS Key Laboratory of Wireless-Optical Communications, University of Science and Technology of China,Hefei 230026,China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2020.01.004

Received Date: 28 March 2019
Rev Recd Date: 17 July 2019
Publish Date: 31 January 2020

Abstract Full text PDF

Abstract

Abstract

Accurate service identification of network data streams is a prerequisite for providing differentiated services. The commonly used supervised learning is difficult to implement when constructing training data sets due to the need for a large number of human annotations. Semi-supervised learning based on a small amount of annotated data has become one of the research hotspots. Semi-supervised framework of Self-paced Co-training adopts the method of collaboration that processes the easier pieces first using multiple perspectives when dealing with unlabeled data. However, this method only uses confidence as the criterion to select pseudo labels for samples, which can easily lead to the gradual decline of multi-perspective differences in the training process, resulting in the decline of synergy gain and the limitation of model performance. Therefore, for the recognition of WeChat data streams, a self-paced co-training model based on fuzziness (FBSpaCo) is proposed. When labeling pseudo labels, the fuzziness evaluation mechanism is introduced. Experiments show that the model can effectively avoid the decline of the difference between two perspectives in the training process. Compared with the existing methods, the recognition accuracy is greatly improved.

Abstract

Accurate service identification of network data streams is a prerequisite for providing differentiated services. The commonly used supervised learning is difficult to implement when constructing training data sets due to the need for a large number of human annotations. Semi-supervised learning based on a small amount of annotated data has become one of the research hotspots. Semi-supervised framework of Self-paced Co-training adopts the method of collaboration that processes the easier pieces first using multiple perspectives when dealing with unlabeled data. However, this method only uses confidence as the criterion to select pseudo labels for samples, which can easily lead to the gradual decline of multi-perspective differences in the training process, resulting in the decline of synergy gain and the limitation of model performance. Therefore, for the recognition of WeChat data streams, a self-paced co-training model based on fuzziness (FBSpaCo) is proposed. When labeling pseudo labels, the fuzziness evaluation mechanism is introduced. Experiments show that the model can effectively avoid the decline of the difference between two perspectives in the training process. Compared with the existing methods, the recognition accuracy is greatly improved.