SIS：一种新的多尺度卷积算子

周满; 傅雪阳; 刘爱萍

doi:10.52396/JUSTC-2021-0188

SIS：一种新的多尺度卷积算子

SIS: A new multi-scale convolutional operator

摘要

摘要: 具有泛化能力的视觉特征对于计算机视觉任务来说是至关重要的。基于深度神经网络的方法采用逐层叠加特征的形式获取多尺度特征图，导致计算开销显著增加。为解决这一问题，通过在标准卷积算子中部署渐进式多尺度架构，提出一种轻量和高效的尺度嵌套卷积算子（scale-in-scale，SIS）。具体来说，设计了一种变换—分离—对抗机制来优化常规的通道计算，减轻了计算成本，同时在单一卷积层内扩大了感受野。同时，引入权重共享与特征拆分交互运算，并结合特征递归和融合机制，使所提出 SIS算子能够与其他卷积算子结合，例如经典的ResNet和Res2Net架构。我们将SIS算子部署到第29层、50层和101层的ResNet和Res2Net变体中，并在CIFAR、PASCAL VOC和COCO2017等公开基准数据集上评估这些修改后的模型。实验结果表明，所提出的方法在图像分类、关键点估计、语义分割和物体检测等计算机视觉任务上的性能均优于同时期最先进的方法。

Abstract: Visual features with high potential for generalization are critical for computer vision applications. In addition to the computational overhead associated with layer-by-layer feature stacking to produce multi-scale feature maps, existing approaches also incur high computational costs. To address this issue, we present a compact and efficient scale-in-scale convolution operator called SIS by incorporating an efficient progressive multi-scale architecture into a standard convolution operator. More precisely, the suggested operator uses the channel transform-divide-and-conquer technique to optimize conventional channel-wise computing, thereby lowering the computational cost while simultaneously expanding the receptive fields within a single convolution layer. Moreover, the proposed SIS operator incorporates weight-sharing with split-and-interact and recur-and-fuse mechanisms for enhanced variant design. The suggested SIS series is easily pluggable into any promising convolutional backbone, such as the well-known ResNet and Res2Net. Furthermore, we incorporated the proposed SIS operator series into 29-layer, 50-layer, and 101-layer ResNet as well as Res2Net variants and evaluated these modified models on the widely used CIFAR, PASCAL VOC, and COCO2017 benchmark datasets, where they consistently outperformed state-of-the-art models on a variety of major vision tasks, including image classification, key point estimation, semantic segmentation, and object detection.

HTML全文

参考文献(29)

施引文献

资源附件(1)