Skip to content

Latest commit

 

History

History
 
 

deeplabv3

DeepLabV3, DeeplabV3+ Based on MindCV Backbones

DeeplabV3: Rethinking Atrous Convolution for Semantic Image Segmentation

DeeplabV3+:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Introduction

DeepLabV3 is a semantic segmentation architecture improved over previous version. Two main contributions of DeepLabV3 are as follows. 1) Modules are designed which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates to handle the problem of segmenting objects at multiple scale. 2) The Atrous Spatial Pyramid Pooling (ASPP) module is augmented with image-level features encoding global context and further boost performance. The improved ASPP applys global average pooling on the last feature map of the model, feeds the resulting image-level features to a 1 × 1 convolution with 256 filters (and batch normalization), and then bilinearly upsamples the feature to the desired spatial dimension. The DenseCRF post-processing from DeepLabV2 is deprecated.

Figure 1. Architecture of DeepLabV3 with output_stride=16 [1]

DeepLabV3+ extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. It combines advantages from Spatial pyramid pooling module and encode-decoder structure. The last feature map before logits in the origin deeplabv3 becomes the encoder output. The encoder features are first bilinearly upsampled by a factor of 4 and then concatenated with the corresponding low-level features from the network backbone that have the same spatial resolution. Another 1 × 1 convolution is applied on the low-level features to reduce the number of channels. After the concatenation, a few 3 × 3 convolutions are applied to refine the features followed by another simple bilinear upsampling by a factor of 4.

Figure 2. DeepLabv3+ extends DeepLabv3 by employing a encoderdecoder structure [2]

This example provides implementations of DeepLabV3 and DeepLabV3+ using backbones from MindCV. More details about feature extraction of MindCV are in this tutorial. Note that the ResNet in DeepLab contains atrous convolutions with different rates, dilated_resnet.py is provided as a modification of ResNet from MindCV, with atrous convolutions in block 3-4.

Quick Start

Preparation

  1. Clone MindCV repository, enter mindcv and assume we are always in this project root.

    git clone https://github.com/mindspore-lab/mindcv.git
    cd mindcv
  2. Install dependencies as shown here, and also install cv2, addict.

    pip install opencv-python
    pip install addict
  3. Prepare dataset

    • Download Pascal VOC 2012 dataset, VOC2012 and Semantic Boundaries Dataset, SBD.

    • Prepare training and test data list files with the path to image and annotation pairs. You could simply run python examples/seg/deeplabv3/preprocess/get_data_list.py --data_root=/path/to/data to generate the list files. This command results in 5 data list files. The lines in a list file should be like as follows:

      /path/to/data/JPEGImages/2007_000032.jpg /path/to/data/SegmentationClassGray/2007_000032.png
      /path/to/data/JPEGImages/2007_000039.jpg /path/to/data/SegmentationClassGray/2007_000039.png
      /path/to/data/JPEGImages/2007_000063.jpg /path/to/data/SegmentationClassGray/2007_000063.png
      ......
      
    • Convert training dataset to mindrecords by running build_seg_data.py script. In accord with paper, we train on trainaug dataset (voc train + SBD). You can train on other dataset by changing the data list path at keyword data_list with the path of your target training set.

      python examples/seg/deeplabv3/preprocess/build_seg_data.py \
      		--data_root=[root path of training data] \
      		--data_list=[path of data list file prepared above] \
      		--dst_path=[path to save mindrecords] \
      		--num_shards=8
    • Note: the training steps use datasets in mindrecord format, while the evaluation steps directly use the data list files.

  4. Backbone: download pre-trained backbone from MindCV, here we use ResNet101.

Train

Specify deeplabv3 or deeplabv3plus at the key word model in the config file.

It is highly recommended to use distributed training for this DeepLabV3 and DeepLabV3+ implementation.

For distributed training using OpenMPI's mpirun, simply run

mpirun -n [# of devices] python examples/seg/deeplabv3/train.py --config [the path to the config file]

For distributed training with Ascend rank table, configure ascend8p.sh as follows

#!/bin/bash
export DEVICE_NUM=8
export RANK_SIZE=8
export RANK_TABLE_FILE="./hccl_8p_01234567_127.0.0.1.json"

for ((i = 0; i < ${DEVICE_NUM}; i++)); do
   export DEVICE_ID=$i
   export RANK_ID=$i
   python -u examples/seg/deeplabv3/train.py --config [the path to the config file]  &> ./train_$i.log &
done

and start training by running:

bash ascend8p.sh

For single-device training, simply set the keyword distributed to False in the config file and run:

python examples/seg/deeplabv3/train.py --config [the path to the config file]

Take mpirun command as an example, the training steps are as follow:

  • Step 1: Employ output_stride=16 and fine-tune pretrained resnet101 on trainaug dataset. In config file, please specify the path of pretrained backbone checkpoint in keyword backbone_ckpt_path and set output_stride to 16.

    # for deeplabv3
    mpirun -n 8 python examples/seg/deeplabv3/train.py --config examples/seg/deeplabv3/config/deeplabv3_s16_dilated_resnet101.yaml
    
    # for deeplabv3+
    mpirun -n 8 python examples/seg/deeplabv3/train.py --config examples/seg/deeplabv3/config/deeplabv3plus_s16_dilated_resnet101.yaml
  • Step 2: Employ output_stride=8, fine-tune model from step 1 on trainaug dataset with smaller base learning rate. In config file, please specify the path of checkpoint from previous step in ckpt_path, set ckpt_pre_trained to True and set output_stride to 8 .

    # for deeplabv3
    mpirun -n 8 python examples/seg/deeplabv3/train.py --config examples/seg/deeplabv3/config/deeplabv3_s8_dilated_resnet101.yaml
    
    # for deeplabv3+
    mpirun -n 8 python examples/seg/deeplabv3/train.py --config examples/seg/deeplabv3/config/deeplabv3plus_s8_dilated_resnet101.yaml

Test

For testing the trained model, first specify the path to the model checkpoint at keyword ckpt_path in the config file. You could modify output_stride, flip, scales in the config file during inference.

For example, after replacing ckpt_path in config file with checkpoint from 2-step training of deeplabv3, commands below employ os=8 without left-right filpped or muticale inputs.

python examples/seg/deeplabv3/eval.py --config examples/seg/deeplabv3/config/deeplabv3_s8_dilated_resnet101.yaml

Results

Config

Model OS=16 config OS=8 config Download
DeepLabV3 yaml yaml weights
DeepLabV3+ yaml yaml weights

Model results

Model Infer OS MS FLIP mIoU
DeepLabV3 16 77.33
DeepLabV3 8 79.16
DeepLabV3 8 79.93
DeepLabV3 8 80.14
DeepLabV3+ 16 78.99
DeepLabV3+ 8 80.31
DeepLabV3+ 8 80.99
DeepLabV3+ 8 81.10

Note: OS: output stride. MS: multiscale inputs during test. Flip: adding left-right flipped inputs during test. Weights are checkpoint files saved after two-step training.

As illustrated in [1], adding left-right flipped inputs or muilt-scale inputs during test could improve the performence. Also, once the model is finally trained, employed output_stride=8 during inference bring improvement over using output_stride=16.

References

[1] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.

[2] Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European conference on computer vision (ECCV). 2018.