ISSN 0253-2778

CN 34-1054/N

2018 Vol. 48, No. 4

Display Method:
Deducing for dynamic time warping distance
2018, 48(4): 261-274. doi: 10.3969/j.issn.0253-2778.2018.04.001
The current research achievements show that the dynamic time warping (DTW) is the best measure in most area of time series similarity measurements. However, the high time complexity for calculating DTW distance directly, and the fact that DTW does not satisfy the triangle inequality, render it impossible to deduce TWD quickly. Nowadays DTW optimizing methods are mainly devoted to designing low time complexity DTW low bound distances with low time complexity to accelerate time series comparison. Unfortunately, these DTW low bound distances cannot be deduced, either. Therefore, it must be compared one by one to compute time series similarity, which has high I/O cost. A novel educible DTW low bound distance is thus proposed, along with a corresponding index building method and a similar time series query algorithm. It is the first research on the DTW deducing problem. Extended experiment results show that compared to current technologies, the proposed method is efficient in both time complexity and I/O cost.
Original Paper
A parallel algorithm for constructing concept lattice based on hierarchical concept under MapReduce
CAI Yong, CHEN Hongmei
2018, 48(4): 275-283. doi: 10.3969/j.issn.0253-2778.2018.04.002
Concept lattice is the core data structure of formal concept analysis. #br##br#A parallel algorithm is focused on for constructing concept lattice under the framework of MapReduce using the methods of divide and conquer based on partition and constrains in layers which aim to construct concept lattice effectively. Firstly, sub-formal contexts are formed by partitioning the formal context by objects and the concepts in each sub-formal context are calculated. Then the global concept is formed by merging concepts in different nodes. Next, different layers of concepts are formed by partitioning the global concept. Finally, constraints in different layers are used to compute the scope of search and concept lattice is constructed by searching and merging parent-son nodes in different layers of concepts. The proposed algorithm is realized in the framework of MapReduce. Extensive experiments carried out on public datasets verify the effectiveness of the parallel algorithm based on concept layer to deal with the formal context in big data.
An accelerator for kernel ridge regression algorithms based on data partition
LIU Enjiang, SONG Yunsheng, LIANG Jiye
2018, 48(4): 284-289. doi: 10.3969/j.issn.0253-2778.2018.04.003
Kernel ridge regression (KRR) is an important regression algorithm widely used in pattern recognition and data mining for its interpretability and strong generalization capability. However, it has the defect of low training efficiency when faced with large-scale data. To address this problem, an accelerating algorithm is proposed which uses the concept of divide-and-conquer for kernel ridge regression based on data partition (PP-KRR). Firstly, the current training data space is divided into m mutually disjoint regions by a bunch of parallel hyperplanes. Secondly, each KRR model is trained on each region respectively. Finally, each unlabeled instance is predicted by the KRR model within the same region. Comparisons with three traditional algorithms on real datasets show that the proposed algorithm obtains similar prediction accuracy with less training time.
Unsupervised feature selection method based on adaptive locality preserving projection
YAN Fei, WANG Xiaodong
2018, 48(4): 290-297. doi: 10.3969/j.issn.0253-2778.2018.04.004
The unsupervised feature selection method based on spectrogram is constructed in the original high dimensional data space, which is easily disturbed by noise or redundant features. To overcome these deficiencies, an unsupervised feature selection method based on adaptive locality preserving projection is proposed. Global linear regression function is utilized to construct feature selection model, and the adaptive local preserving projection is adopted to improve model accuracy. Then the l2,1-norm constraint is added to improve the distinguishability of different features and avoid noise interference. A comparison with several state-of-the-art feature selection methods demostrate the effectiveness of the proposed method.
Multifeature hyperspectral image classification based on adaptive kernel joint sparse representation
ZHANG Huimin, YANG Ming, LV Jing
2018, 48(4): 298-306. doi: 10.3969/j.issn.0253-2778.2018.04.005
Sparse representation has proved to be a powerful tool in hyperspectral image (HSI) classification, and the advantages of joint classification using multifeature information have also attracted in HSI classification field. However, the sparse strategy of multifeature data and the non-linearity in data are two difficult problems. A kernel adaptive sparse model is proposed to classify hyperspectral images. For several complementary features (gradient, texture and shape), the proposed model simultaneously obtains the representation vector for each feature, and utilizes the adaptive sparse strategy ladaptive,0 to effectively use the multifeature information. The adaptive sparse strategy not only limits the representation of pixels in different feature spaces by atoms from a particular class, but also allows the selected atoms of these pixels to be different, thus providing a better representation. In addition, the proposed kernel joint sparse representation model is used to deal with non-linear problems of data. The kernel model projects data into high-dimensional space to improve separability and achieve better performance than linear models. The experimental results of the Indian Pines and University of Pavia show that the proposed algorithm exhibits a higher classification accuracy.
Transaction certification model of distributed energy based on consortium blockchain
SHE Wei, YANG Xiaoyu, HU Yue, LIU Qi, LIU Wei
2018, 48(4): 307-313. doi: 10.3969/j.issn.0253-2778.2018.04.006
Targeting data security issues of distributed energy in trade certification, A transaction certification model of distributed energy based on consortium blockchain was proposed. By means of proof of stake, data encryption, timestamp and distributed consensus, the mode of traditional energy transaction was optimized. The model solved the problem of high centralization of transaction data by the distributed shared account book, which protects the privacy of the users and improves the transparency of information and the level of autornatic certification. Simulation results verified the effectiveness of the model.
Visual analysis of semantic measuring of online social relationships
WANG Yuan, SUN Yingjun, WANG Bo, YANG Chaozhou, YANG Liang
2018, 48(4): 314-321. doi: 10.3969/j.issn.0253-2778.2018.04.007
In the traditional social relationships analysis, the attribute of social relation is regarded as objective and independent of the subjective cognition of the participant. However, in the social computing studies related to subjective behaviors, subjective features are often more important than objective features. The semantics of social relationship with the interactive language between individuals is visualized. Based on The key features of the interactive language in the theory of social linguistics, four language features to describe the semantics of social relationships are calculated, including frequency, length, fluency and sentiment polarity. By measuring and distinguishing personal language habits, the semantic measuring of social relationship more appropriate. To make semantic measuring more understand, a visual analysis system is implemented for online social relationships using Email data as a case, and the factors related to the semantic features of online social relationships are priminarily analyzed.
Cross-media semantic retrieval with deep canonical correlation analysis
WANG Shu, SHI Zhongzhi
2018, 48(4): 322-330. doi: 10.3969/j.issn.0253-2778.2018.04.008
The cross-media retrieval with canonical correlation analysis (CCA) is a method to map different media features to the largest correlation isomorphism subspace through the canonical correlation analysis, and compare the similarity between cross-media data in the subspace. However CCA is a linear model and can not adequately exploit the complex correlation between cross-media data. The structure of the traditional deep canonical correlation analysis (DCCA) is improved, and the latent dirichlet allocation (LDA) is used to discover the semantic information in the text data and learns the semantic mapping. The cross-media correlation learning with deep canonical correlation analysis (CMC-DCCA) and the cross-media semantic correlation retrieval (CMSCR) are proposed. Experiments on the Wikipedia text image dataset shows that the CMC-DCCA model can mine the complex correlation between cross-media data better, and that CMSCR has better performance in cross-media retrieval.
Mechanism analysis of the accelerator for k-nearest neighbor algorithm based on data partition
SONG Yunsheng, WANG Jie, LIANG Jiye
2018, 48(4): 331-340. doi: 10.3969/j.issn.0253-2778.2018.04.009
Due to its absence of hypotheses for the underlying distributions of data, simple execution and strong generation ability, k-nearest neighbor classification algorithm (kNN) is widely used in face recognition, text classification, emotional analysis and other fields. kNN does not need the training process, but it only stores the training instances until the unlabeled instance appears, and executes the predicted process. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances, hence it is difficult to deal with large-scale data. To overcome this difficulty, #br##br# the process of computing the nearest neighbors is converted to a constrained optimization problem, and an estimation is given of difference on the value of the objective function under the optimal solution with and without data partition. The theoretical analysis of this estimation indicates that data partition using clustering can reduce this difference, and the k-nearest neighbor algorithm based on clustering can have a strong generation ability. Experiment results on public datasets show that the k-nearest neighbor algorithm based on clustering can largely obtain the same nearest neighbors of raw kNN, thus obtaining higher classification accuracy.
Research on flow-limiting facility optimization in rail transit stations based on optical feature descriptor
WANG Zesheng, DONG Baotian, LUO Wenhui
2018, 48(4): 341-346. doi: 10.3969/j.issn.0253-2778.2018.04.010
To address the problem of low intelligence and flexibility of existing flow limiting facilities, a new optimization method for flow-limiting facilities in rail transit stations based on optical feature descriptors, is proposed. First, the region of interest (ROI) is set according to the scene characteristics of rail transit stations to reduce the computation of subsequent operation. Then, the features of image sequence are analyzed by establishing optical feature descriptors. Finally, the one-class SVM is adjusted according to the clumped features of pedestrians to make condition detection possible. Experimental results demonstrate that the proposed method can detect the overload status accurately, improve the automatic level of flow-limiting facilities effectively, and provides data support and theoretical basis for organization and management of pedestrians in rail transit stations.