Information Science and Technology
Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems. However, existing radar-camera fusion methods are highly dependent on the prior camera detection results, rendering the overall performance unsatisfactory. In this paper, we propose a bidirectional fusion scheme in the bird-eye view (BEV-radar), which is independent of prior camera detection results. Leveraging features from both modalities, our method designs a bidirectional attention-based fusion strategy. Specifically, following BEV-based 3D detection methods, our method engages a bidirectional transformer to embed information from both modalities and enforces the local spatial relationship according to subsequent convolution blocks. After embedding the features, the BEV features are decoded in the 3D object prediction head. We evaluate our method on the nuScenes dataset, achieving 48.2 mAP and 57.6 NDS. The result shows considerable improvements compared to the camera-only baseline, especially in terms of velocity prediction. The code is available at
Question generation aims to generate meaningful and fluent questions, which can address the lack of a question-answer type annotated corpus by augmenting the available data. Using unannotated text with optional answers as input contents, question generation can be divided into two types based on whether answers are provided: answer-aware and answer-agnostic. While generating questions by providing answers is challenging, generating high-quality questions without providing answers is even more difficult for both humans and machines. To address this issue, we proposed a novel end-to-end model called question generation with answer extractor (QGAE), which is able to transform answer-agnostic question generation into answer-aware question generation by directly extracting candidate answers. This approach effectively utilizes unlabeled data for generating high-quality question-answer pairs, and its end-to-end design makes it more convenient than a multi-stage method that requires at least two pre-trained models. Moreover, our model achieves better average scores and greater diversity. Our experiments show that QGAE achieves significant improvements in generating question-answer pairs, making it a promising approach for question generation.
Due to the complexity and diversity of production environments, it is essential to understand the robustness of unsupervised anomaly detection models to common corruptions. To explore this issue systematically, we propose a dataset named MVTec-C to evaluate the robustness of unsupervised anomaly detection models. Based on this dataset, we explore the robustness of approaches in five paradigms, namely, reconstruction-based, representation similarity-based, normalizing flow-based, self-supervised representation learning-based, and knowledge distillation-based paradigms. Furthermore, we explore the impact of different modules within two optimal methods on robustness and accuracy. This includes the multi-scale features, the neighborhood size, and the sampling ratio in the PatchCore method, as well as the multi-scale features, the MMF module, the OCE module, and the multi-scale distillation in the Reverse Distillation method. Finally, we propose a feature alignment module (FAM) to reduce the feature drift caused by corruptions and combine PatchCore and the FAM to obtain a model with both high performance and high accuracy. We hope this work will serve as an evaluation method and provide experience in building robust anomaly detection models in the future.
Federated learning allows multiple mobile participants to jointly train a global model without revealing their local private data. Communication-computation cost and privacy preservation are key fundamental issues in federated learning. Existing secret sharing-based secure aggregation mechanisms for federated learning still suffer from significant additional costs, insufficient privacy preservation, and vulnerability to participant dropouts. In this paper, we aim to solve these issues by introducing flexible and effective secret sharing mechanisms into federated learning. We propose two novel privacy-preserving federated learning schemes: federated learning based on one-way secret sharing (FLOSS) and federated learning based on multishot secret sharing (FLMSS). Compared with the state-of-the-art works, FLOSS enables high privacy preservation while significantly reducing the communication cost by dynamically designing secretly shared content and objects. Meanwhile, FLMSS further reduces the additional cost and has the ability to efficiently enhance the robustness of participant dropouts in federated learning. Foremost, FLMSS achieves a satisfactory tradeoff between privacy preservation and communication-computation cost. Security analysis and performance evaluations on real datasets demonstrate the superiority of our proposed schemes in terms of model accuracy, privacy preservation, and cost reduction.
Stroke can lead to the impaired motor function in patients’ lower limbs and hemiplegia. Accurate assessment of lower limb motor ability is important for diagnosis and rehabilitation. To digitalize such assessments so that each test can be traced back at any time and subjectivity can be avoided, we test how dual-modality smart shoes equipped with pressure-sensitive insoles and inertial measurement units can be used for this purpose. A 5 m walking test protocol, including the left and right turns, is designed. The data are collected from 23 patients and 17 healthy subjects. For the lower limbs’ motor ability, the tests are performed by two physicians and assessed using the five-grade Medical Research Council scale for muscle examination. The average of two physicians’ scores for the same patient is used as the ground truth. Using the feature set we developed, 100% accuracy is achieved in classifying the patients and healthy subjects. For patients’ muscle strength, a mean absolute error of 0.143 and a maximum error of 0.395 are achieved using our feature set and the regression method; these values are closer to the ground truth than the scores from each physician (mean absolute error: 0.217, maximum error: 0.5). We thus validate the possibility of using such smart shoes to objectively and accurately evaluate the muscle strength of the lower limbs of stroke patients.
Semi-supervised learning (SSL) has been applied to many practical applications over the past few years. Recently, distributed graph-based semi-supervised learning (DGSSL) has been shown to have good performance. Current DGSSL algorithms usually have the problems of inefficient graph construction and the straggler effect. This paper proposes a novel coded DGSSL (CDGSSL) to solve these problems. We first provide a novel parallel and distributed solution of matrix completion for efficient graph construction. Then, we develop the CDGSSL algorithm based on coding theory. Specifically, the proposed algorithm consists of two parts separately designed based on the maximum distance separable (MDS) code. In general, the proposed coded distributed algorithm is efficient and straggler tolerant. Moreover, we provide an optimal parameter design for the proposed algorithm. The results of the experiments on the Alibaba Cloud elastic compute service (ECS) demonstrate the superiority of the proposed algorithm.
Noise reduction (NR) is a necessary front-end in many audio applications for improving signal quality. It was shown that sparsity-promoting sensor selection potentially makes a trade-off between energy consumption and NR performance, which is rather important for large-scale wireless acoustic sensor networks (WASNs), where many sensors contribute negligibly to NR but energy consumption affects the lifetime of WASNs. This paper presents a sensor selection approach for beamforming-based NR by minimizing the total energy consumption and constraining the output noise variance. Motivated by the optimal semi-definite programming (SDP) solution and the utility-based method, we propose three low-complexity selection metrics: weighted utility, gradient, and weighted input signal-to-noise ratio (SNR). It is shown that the proposed weighted utility and gradient-based methods are near-optimal in performance but much faster than the SDP-based method, and the weighted SNR method has the lowest time complexity with a tiny performance sacrifice. Numerical results using a simulated WASN validate the superiority of the proposed approaches over conventional methods.
Regression problems among multiple responses and predictors have been widely employed in many applications, such as biomedical sciences and economics. In this paper, we focus on statistical inference for the unknown coefficient matrix in high-dimensional multi-task learning problems. The new statistic is constructed in a row-wise manner based on a two-step projection technique, which improves the inference efficiency by removing the impacts of important signals. Based on the established asymptotic normality for the proposed two-step projection estimator (TPE), we generate corresponding confidence intervals for all components of the unknown coefficient matrix. The performance of the proposed method is presented through simulation studies and a real data analysis.
To deal with emergencies and disasters without rescue workers being exposed to dangerous environments, this paper presents a mobile rescue robot, Earthshaker. As a combination of a tracked chassis and a six-degree-of-freedom robotic arm, as well as miscellaneous sensors and controllers, Earthshaker is capable of traversing diverse terrains and fulfilling dexterous manipulation. Specifically, Earthshaker has a unique swing arm—dozer blade structure that can help clear up cumbersome obstacles and stabilize the robot on stairs, a multimodal teleoperation system that can adapt to different transmission conditions, a depth camera-aided robotic arm and gripper that can realize semiautonomous manipulation and a LiDAR aided base that can achieve autonomous navigation in unknown areas. It was these special systems that supported Earthshaker to win the first Advanced Technology & Engineering Challenge (A-TEC) championships, standing out of 40 robots from the world and showing the efficacy of system integration and the advanced control philosophy behind it.
The rapid development of social media leads to the spread of a large amount of false news, which not only affects people’s daily life but also harms the credibility of social media platforms. Therefore, detecting Chinese fake news is a challenging and meaningful task. However, existing fake news datasets from Chinese social media platforms have a relatively small amount of data and data collection in this field is relatively old, thus being unable to meet the requirements of further research. In consideration of this background, we release a new Chinese Weibo Fake News dataset, which contains 26320 fake news data collected from Weibo. In addition, we propose a fake news detection model based on data augmentation that can effectively solve the problem of a lack of fake news, and we improve the generalization ability and robustness of the model. We conduct numerous experiments on our Chinese Weibo Fake News dataset and successfully deploy the model on the web page. The experimental performance proves the effectiveness of the proposed end-to-end model for detecting fake news on social media platforms.