ISSN 0253-2778

CN 34-1054/N

2018 Vol. 48, No. 6

Display Method:
Original Paper
Research on a new annealing algorithm for mixed model of directed networks
WANG Jinghong, CAI Bianfang, LI Bi
2018, 48(6): 433-439. doi: 10.3969/j.issn.0253-2778.2018.06.001
Although the traditional expectation-maximization (EM) algorithm of the mixed model can effectively explore the structural regularity of the network, it always gets stuck in some local maximum. A deterministic annealing expectation maximization (NMEM) algorithm is proposed to solve this problem, which not only prevents local optimum but also improves convergence speed and is thus used to estimate the parameters of the hybrid model. The algorithm always sets its initial parameters β0 through experience. If β0 is too small, the results are meaningless, or if β0 is too large, it will converge to the local maximum more frequently. Furthermore, a new hybrid model of directional network and a parameter selection method of β0 were designed.
Link prediction in dynamical networks based on mutual information
QI Fangpeng, WANG Tong, ZHOU Mingyang, FU Zhongqian
2018, 48(6): 440-446. doi: 10.3969/j.issn.0253-2778.2018.06.002
The critical issue of link prediction is to measure the existence probability of new links between nodes with the known node properties and observed structural features. The previous researches on complex network are conducted based on the oversimplified assumption that the network structure is static, while the temporal dimension has great impacts on structure and dynamic of real systems, which limits the performance of state-of-the-art methods. Inspired by this, we proposed moving average mutual information (MA_MI) algorithm that combines mutual information with moving average model. This method not only considers the information of common neighbors of nodes, but also describes the evolution pattern of network with historical information. Experimental results on four real-world networks show that MA_MI outperforms the traditional methods and achieves the higher precision.
Simulated annealing based semi-supervised support vector machine for credit prediction
2018, 48(6): 447-457. doi: 10.3969/j.issn.0253-2778.2018.06.003
In the mid-1990s financial institutions began to combine consumer and business information to create scores for business credits. Enterprises in China, especially small and micro enterprises, have less credit information, resulting in the situation where only a small number of enterprises have credit information, while a large number of enterprises have none. However, semi-supervised support vector machines (S3VM) can learn from labeled data and unlabeled data and solve the problems of imbalanced credit data categories and insufficient sample information. The parameters of S3VM have a great influence on the effect of the algorithm, and the actual parameter selection is often based on experience. An SAS3VM algorithm was proposed to optimize the parameters of deterministic annealing based semi-supervised support vector machine (DAS3VM) with simulated annealing. Based on the small number of labeled credit data, the algorithm takes advantage of the unlabeled credit data to help study and use the simulate annealing to find the optimal parameters. Experiments were conducted on two categories of enterprise credit data and three categories of personal credit data. The results show that semi-supervised learning (DAS3VM and SAS3VM) performs better than supervised learning. The maximum accuracy of SAS3VM has been increased by 13.108% compared with DAS3VM.
A linear programming-based model for multi-object and multi-resource allocation in emergency rescue
WANG Yuechen, SU Xing, JIA Xibin, GUO Limin, DING Zhiming
2018, 48(6): 458-466. doi: 10.3969/j.issn.0253-2778.2018.06.004
In an emergency rescue, the situations are different and the deadlines of tasks and available resources are limited, difficult for the single-object and one-to-one resource allocation approaches to handle which makes it problems. To this end, an innovative model is proposed for the multi-object and multi-resource allocation in emergency rescues. By combining the resources, the time consumption for task execution is reduced and the capability of resources is enhanced. In addition, through adjusting the weights of multiple objects, linear programming is employed to generate the resource allocation plan, which can satisfy different requirements of resource allocation in an emergency rescue. Finally, through employing the idea of the multi-stage resource allocation in dynamic programming, our model can handle the dynamics of tasks and resources in an emergency rescue. Experimental results show that our model has good adaptability to multi-resource allocation in different rescue tasks and objects. In addition, the multi-stage characteristic of our model can suit the dynamics of tasks and resources in emergency rescues.
Comprehensive evaluation of small and medium enterprises
DU Xiaoping, ZHAO Kaiqi, YUAN Wei, YANG Qinghong, ZHANG Jianwei
2018, 48(6): 467-476. doi: 10.3969/j.issn.0253-2778.2018.06.005
In order to solve the problem of inconsistent evaluation results and incomplete evaluation aspects in previous enterprise evaluation models, this paper constructs a comprehensive evaluation model(CEM), which exploits the advantages of each algorithm and improves the accuracy of the evaluation results. Firstly, the model establishes the evaluation index system from seven aspects including growth force, competitive force, financing force, team force, opinion force, external force and innovation force, and filters the indicators from three aspects. Secondly, it uses three independent, methods to evaluate enterprises, based on the combination of subjectivity and objectivity, and checks the consistency of these three results by Kendall-W co-ordination coefficient. Thirdly, two algorithms, which are arithmetic average and factor analysis, are used respectively to combine the three results above. Fourthly, the correlation between independent evaluation results and combined evaluation results is computed and the more accurate model is selected by spearman rank correlation coefficient. Finally, the model evaluates 3430 small and medium enterprises from the new over-the-counter market of China, based on their three-year data (from 2013 to 2105) and verify the validity of the result from two aspects, which are performance (net profit growth rate, main business margin, main business growth rate) and stock market value.
Split and merge algorithm for Gaussian mixture model based on KS test
JIANG Shuoran, CHEN Yarui, QIN Zhifei, YANG Jucheng
2018, 48(6): 477-485. doi: 10.3969/j.issn.0253-2778.2018.06.006
Gaussian mixture model is a linear combination of finite numbers of independent Gaussian models. Estimating the number of components is an important research area. One class of algorithms based on the minimum description length determine the number of components by splitting and merging components during the iterations. Traditional algorithms use entropy ratio, KL divergence, model similarity as split and merge criteria. However, entropy ratio and KL divergence might result in excessive split because of their excessive sensitivity to sparse or concave models, and model similarity might result in excessive merge because of its inability to assess the merged models’ goodness of fitting Gaussian. In the iterations of algorithm, these excessive splitting and merging operations may cause oscillations. For these problems entropy ratio and KS test as split criteria, and models similarity and KS test were used as merge criteria, which be called problems, a split and merge algorithm for Gaussian mixture model based on KS test is proposed, with entropy ratio and KS test used as split criteria and model similarity and KS test as merge criteria. This algorithm is capable of preventing excessive split and merge, as validated by experiments conducted on seven datasets.
Research on the auction strategies and pricing of big data
CHEN Zhizhu, WANG Hongzhi, XIONG Feng, ZHANG Yice, GAO Hong, LI Jianzhong
2018, 48(6): 486-494. doi: 10.3969/j.issn.0253-2778.2018.06.007
It is difficult to price big daga due to its diversity and sparsity, With big data auction set as background, for the problem that big data has a characteristic of replicability and that traditional auction models cannot determine how many big data items sellers should auction to get the maximum profits. A discussion is given on what auction strategies big data sellers should adopt under various circumstance to maximize their profits and on the appropriate number of winning bidders. The traditional Vickrey auction model and sequential auction model are modified, and two new auction models, which are suitable for different conditions of the auction, are proposed. Finally, the two auction models designed and relevant earnings expectations have been obtained.
A scheduling algorithm towards bandwidth guarantee for virtual cluster in the cloud
XU Hua, LI Jing
2018, 48(6): 495-503. doi: 10.3969/j.issn.0253-2778.2018.06.008
Due to the sharing of network resources in multi-tenant cloud data centers, minimum bandwidth guarantee has become one of the important methods to provide predictable performance for cloud applications. Efficient virtual network allocation helps accommodate a larger number of virtual clusters and improve the resource utilization in the cloud. Towards the demand for network bandwidth guarantee, this paper proposes a novel backtracking algorithm for scheduling virtual clusters in the cloud. For the typical tree network topology of a data center, this algorithm firstly judges whether there exists a valid solution inside each sub-tree in the network topology. After determining the sub-tree for the virtual cluster, it recursively searches for the detailed solution inside the sub-tree based on a backtracking algorithm, thus avoiding the problems of fake allocation or rejecting the request by mistake existing in related works. The experimental results show that the exact search based on backtracking can help increase the acceptance ratio of virtual cluster requests. Compared with existing algorithms, it can reduce the rejection ratio by 10% on average, which can contribute to improving the resource utilization in the cloud.
An improved box-counting method for calculating image fractal dimension
XUE Song, JIANG Xinsheng, DUAN Jimiao, ZHANG Peili
2018, 48(6): 504-511. doi: 10.3969/j.issn.0253-2778.2018.06.009
A fractal dimension is a useful feature parameter for texture analysis, segmentation and classification in many fields. The differential box-counting method is frequently used to estimate image fractal dimension because of its simplicity. However this method is flawed with lack of accuracy and stability. A new box-counting method is presented. First, more nodes are into the discrete intensity surface of a digital image to make it relatively more approximate to a continuous surface. This step makes it possible to distinguish different images at the smallest scale. Then, the fractal dimension of the digital image is estimated directly according to the box number at the smallest scale without the fitting step. Experimental results show that this method is more accurate and stable compared with some typical methods. For some special test images, such as pulse images, the proposed method outperormed unreasonable estimates. In addition, because there is no need to calculate the box numbers at other scales, the computational complexity of our method is lower.
A privacy preserving measurement for query under k-anonymity mechanism
CHEN Jiaming, WANG Li, XIAO Yafei, FANG Xianjing
2018, 48(6): 512-518. doi: 10.3969/j.issn.0253-2778.2018.06.010
A query privacy measurement was proposed under the k-anonymity mechanism. The method was based on information entropy and logarithmic function. First, a framework for query privacy under the k-anonymity mechanism is established, which contains four roles and four operations provides a formal description for privacy measurement. Then, two quantitative methods of background knowledge are introduced. For the second step, user attribute discretization values will be calculated as a probability expression of background knowledge, affects the accuracy of the probability expression. The value of each user attribute after discretization was proposed as the index of the array to calculate the relevant quantities, the index of the array being generated by the relevance of the particular query and the attributes of the user, so as to further obtain the probability of the user issuing the particular query, thus avoiding the influence of discretized values of user attributes on the quantification results. Finally, a query privacy measurement is proposed. The experimental results show that the method can effectively measure the level of protection of the query privacy protection algorithm under k-anonymity mechanism.