ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

MAEA-DeepLab: A semantic segmentation network with multi-feature attention effective aggregation module

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2020.08.018
  • Received Date: 11 July 2020
  • Accepted Date: 04 August 2020
  • Rev Recd Date: 04 August 2020
  • Publish Date: 31 August 2020
  • To realize the low cost of network training, the computational complexity is greatly reduced while maintaining high precision. A semantic segmentation network with multi-feature attention effective aggregation module(MAEA) is proposed: MAEA-DeepLab. A 16 stride low-resolution feature map for down-sampling is adopted in the encoder’s network backbone, and high-level features are obtained. The decoder makes full use of the feature's spatial attention mechanism through the MAEA module, effectively aggregates multiple features, and obtains high-resolution features with strong semantic representation. Then the ability of the decoder to recover important details is effectively improved, and high-precision segmentation is achieved. Multiply-adds in MAEA-DeepLab is 943.02B, only 30.9% of the DeepLabV3+ architecture, which greatly reduces the computational complexity. The architecture is not pre-training on the COCO dataset. It performs semantic semantic segmentation Benchmark tests on the test set of with PASCAL VOC 2012 dataset and CityScapes dataset with only two RTX 2080ti GPUs, and the mlOU scores reach 87.5% and 79.9%, respectively. The experimental results show that good semantic segmentation accuracy is achieved with low computational cost in MAEA-DeepLab.
    To realize the low cost of network training, the computational complexity is greatly reduced while maintaining high precision. A semantic segmentation network with multi-feature attention effective aggregation module(MAEA) is proposed: MAEA-DeepLab. A 16 stride low-resolution feature map for down-sampling is adopted in the encoder’s network backbone, and high-level features are obtained. The decoder makes full use of the feature's spatial attention mechanism through the MAEA module, effectively aggregates multiple features, and obtains high-resolution features with strong semantic representation. Then the ability of the decoder to recover important details is effectively improved, and high-precision segmentation is achieved. Multiply-adds in MAEA-DeepLab is 943.02B, only 30.9% of the DeepLabV3+ architecture, which greatly reduces the computational complexity. The architecture is not pre-training on the COCO dataset. It performs semantic semantic segmentation Benchmark tests on the test set of with PASCAL VOC 2012 dataset and CityScapes dataset with only two RTX 2080ti GPUs, and the mlOU scores reach 87.5% and 79.9%, respectively. The experimental results show that good semantic segmentation accuracy is achieved with low computational cost in MAEA-DeepLab.
  • loading
  • [1]
    MOTTAGHI R, CHEN X, LIU X, et al. The role of context for object detection and semantic segmentation in the wild[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014:891-898.
    [2]
    CAESAR H, UIJLINGS J, FERRARI V. Coco-stuff: Thing and stuff classes in context[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1209-1218.
    [3]
    ZHOU B, ZHAO H,PUIG X, et al. Scene parsing through ADE20K dataset[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017:5122-5130.
    [4]
    LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [5]
    PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017.
    [6]
    FU J, LIU J, WANG Y, et al. Stacked deconvolutional network for semantic segmentation [J]. IEEE Transactions on Image Processing, 2017:99.
    [7]
    YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
    [8]
    ZHANG Z, ZHANG X, PENG C, et al. Exfuse: Enhancing feature fusion for semantic segmentation [J]. European Conference on Computer Vision, 2018.
    [9]
    WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation [C]// 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
    [10]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018.
    [11]
    WANG Q, WU B, ZHU P, et al. Eca-net: Efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE, 2020.
    [12]
    CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision [C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham, 2018.
    [13]
    LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
    [14]
    NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2016.
    [15]
    ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015.
    [16]
    RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham,2015.
    [17]
    BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
    [18]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. (2016-06-07)[2020-06-11]. https://arxiv.org/abs/1412.7062v4.
    [19]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
    [20]
    CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05) [2020-06-11]. https://arxiv.org/pdf/1706.05587.
    [21]
    LAZEBNI S, SCHMID C, PONCE J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories [C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006.
    [22]
    ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [J]. IEEE Computer Society, 2016.
    [23]
    YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [C]// Proceedings of the International Conference on Learning Representations. IEEE, 2015.
    [24]
    WU H, ZHANG J, HUANG K, et al. Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation [EB/OL]. (2019-03-28)[2020-06-11]. https://arxiv.org/pdf/1903.11816.pdf.
    [25]
    TIAN Z, HE T, SHEN C, et al. Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
    [26]
    CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
    [27]
    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
    [28]
    LIN T Y, DOLL?倕AR P, GIRSHICK R, et al. Feature pyramid networks for object detection [J]. IEEE Computer Society, 2017.
    [29]
    HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2261-2269.
    [30]
    EVERINGHAM M, ESLAMI S A, GOOL L V, et al. The pascal visual object classes challenge: A retrospective [J]. Springer, 2015, 111(1): 98-136.
    [31]
    CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.
    [32]
    HARIHARAN B, ARBEL?倕AEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C]// IEEE International Conference on Computer Vision. IEEE, 2011.
    [33]
    ABADI M, AGARWAL A, BARHAM P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems [EB/OL]. (2016-03-16)[2020-06-11]. https://arxiv.org/pdf/1603.04467.
    [34]
    WU Z, SHEN C, HENGEL A. Wider or deeper: Revisiting the resnet model for visual recognition [J]. Elsevier, 2019: 119-133.
    [35]
    YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
    [36]
    ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018.
    [37]
    ZHAO H S, QI X J, SHEN X Y, et al. ICNET for real-time semantic segmentation on high-resolution images[C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11207. Springer, Cham, 2018.
    [38]
    LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
    [39]
    CHAO P, KAO C Y, RUAN Y S, et al. Hardnet: A low memory traffic network [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2020.
    [40]
    WU T, TANG S, ZHANG R, et al. Tree-structured kronecker convolutional network for semantic segmentation [C]// 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019.
    [41]
    LIU C, CHEN L C, SCHROFF F, et al. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 82-92.
    [42]
    VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 5000-5008. )
  • 加载中

Catalog

    [1]
    MOTTAGHI R, CHEN X, LIU X, et al. The role of context for object detection and semantic segmentation in the wild[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014:891-898.
    [2]
    CAESAR H, UIJLINGS J, FERRARI V. Coco-stuff: Thing and stuff classes in context[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1209-1218.
    [3]
    ZHOU B, ZHAO H,PUIG X, et al. Scene parsing through ADE20K dataset[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017:5122-5130.
    [4]
    LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [5]
    PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017.
    [6]
    FU J, LIU J, WANG Y, et al. Stacked deconvolutional network for semantic segmentation [J]. IEEE Transactions on Image Processing, 2017:99.
    [7]
    YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
    [8]
    ZHANG Z, ZHANG X, PENG C, et al. Exfuse: Enhancing feature fusion for semantic segmentation [J]. European Conference on Computer Vision, 2018.
    [9]
    WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation [C]// 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
    [10]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018.
    [11]
    WANG Q, WU B, ZHU P, et al. Eca-net: Efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE, 2020.
    [12]
    CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision [C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham, 2018.
    [13]
    LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
    [14]
    NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2016.
    [15]
    ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015.
    [16]
    RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham,2015.
    [17]
    BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
    [18]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. (2016-06-07)[2020-06-11]. https://arxiv.org/abs/1412.7062v4.
    [19]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
    [20]
    CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05) [2020-06-11]. https://arxiv.org/pdf/1706.05587.
    [21]
    LAZEBNI S, SCHMID C, PONCE J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories [C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006.
    [22]
    ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [J]. IEEE Computer Society, 2016.
    [23]
    YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [C]// Proceedings of the International Conference on Learning Representations. IEEE, 2015.
    [24]
    WU H, ZHANG J, HUANG K, et al. Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation [EB/OL]. (2019-03-28)[2020-06-11]. https://arxiv.org/pdf/1903.11816.pdf.
    [25]
    TIAN Z, HE T, SHEN C, et al. Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
    [26]
    CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
    [27]
    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
    [28]
    LIN T Y, DOLL?倕AR P, GIRSHICK R, et al. Feature pyramid networks for object detection [J]. IEEE Computer Society, 2017.
    [29]
    HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2261-2269.
    [30]
    EVERINGHAM M, ESLAMI S A, GOOL L V, et al. The pascal visual object classes challenge: A retrospective [J]. Springer, 2015, 111(1): 98-136.
    [31]
    CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.
    [32]
    HARIHARAN B, ARBEL?倕AEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C]// IEEE International Conference on Computer Vision. IEEE, 2011.
    [33]
    ABADI M, AGARWAL A, BARHAM P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems [EB/OL]. (2016-03-16)[2020-06-11]. https://arxiv.org/pdf/1603.04467.
    [34]
    WU Z, SHEN C, HENGEL A. Wider or deeper: Revisiting the resnet model for visual recognition [J]. Elsevier, 2019: 119-133.
    [35]
    YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
    [36]
    ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018.
    [37]
    ZHAO H S, QI X J, SHEN X Y, et al. ICNET for real-time semantic segmentation on high-resolution images[C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11207. Springer, Cham, 2018.
    [38]
    LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
    [39]
    CHAO P, KAO C Y, RUAN Y S, et al. Hardnet: A low memory traffic network [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2020.
    [40]
    WU T, TANG S, ZHANG R, et al. Tree-structured kronecker convolutional network for semantic segmentation [C]// 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019.
    [41]
    LIU C, CHEN L C, SCHROFF F, et al. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 82-92.
    [42]
    VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 5000-5008. )

    Article Metrics

    Article views (35) PDF downloads(96)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return