Docker Image
- tensorflow/tensorflow:tensorflow:2.4.0-gpu-jupyter
Library
- Pytorch : Stable (1.7.1) - Linux - Python - CUDA (11.0)
- Using Single GPU (not tested on cpu only)
- model.py : VGG-16 Large FOV, DenseCRF, DeepLab v1
- train.py : train VGG-16 Large FOV only (grid search on model.py)
- utils.py : calculate mIoU
- Used similar train settings of paper when training VGG-16 Large FOV
- input : (3, 224, 224)
- batch size : 30
- learning rate : 0.001
- momentum : 0.9
- weight decay : 0.0005
- no learning rate scheduler for convenience
- mIoU score may be quite different with paper cause of lack of learning rate scheduler
2. Brief Summary of 'Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs'
- Improve performance of semantic segmentation
- Atrous algorithm : prevent signal downsampling than original VGG-16
- Fully connected pairwise CRF : preserve fine edge details
- Mean intersection over union
- DCNN : modified VGG-16
- change fully connected layers to convolution layers
- skip subsampling in 2 max-pooling layers
- atrous algorithm in last 3 convolution layers (2x)
- atrous algorithm in fist fully connected layer (4x) and change kernel size to 3*3
- change channel size of fully connected layers (4096 -> 1024)
- change channel size of final fully connected layer (1000 -> 21)
- Fully connected pairwise CRF : followed the paper of Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
- 2-stage training
- learn DCNN first
- learn CRF next
- Augmentation : use extra data
- Objective : sum of cross-entropy terms for each spatial position in the CNN output map
- Train Details
- minibatch SGD with momentum
- batch size : 20
- learning rate : 0.001 (0.01 for final classifier layer)
- momentum : 0.9
- weight decay : 0.0005
- minibatch SGD with momentum
- Upsampling : bilinear upsampling CNN output to get input shape (h, w)
- Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs [paper]