Skip to content

Latest commit

 

History

History
197 lines (138 loc) · 10 KB

README.md

File metadata and controls

197 lines (138 loc) · 10 KB

PyTorch Implementation of CIFAR-10 Image Classification Pipeline Using VGG Like Network

We present here our solution to the famous machine learning problem of image classification with CIFAR-10 dataset with 60000 labeled images. The aim is to learn and assign a category for these 32x32 pixel images.

Dataset

The CIFAR-10 dataset, as it is provided, consists of 5 batches of training images which sum up to 50000 and a batch of 10000 test images.

Each test batch consists of exactly 1000 randomly-selected images from each class. The training batches contain images in random order, some training batches having more images from one class than another. Together, the training batches contain exactly 5000 images from each class.

Here we have used for training and validation purposes only the 50000 images originally meant for training. Stratified K-Folds cross-validation is used to split the data so that the percentage of samples for each class is preserved. Several other reported implementations use the data as it is given and use the given 10000 sample testing set straight for validation. Instead we use the 10000 sample test set for evaluating our trained model.

Model

We have made a PyTorch implementation of Sergey Zagoruyko VGG like network with BatchNormalization and Dropout for the task.

DataParallel(
  (module): VGGBNDrop(
    (features): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace)
      (3): Dropout(p=0.3)
      (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (6): ReLU(inplace)
      (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
      (8): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (10): ReLU(inplace)
      (11): Dropout(p=0.4)
      (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (14): ReLU(inplace)
      (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
      (16): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (17): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (18): ReLU(inplace)
      (19): Dropout(p=0.4)
      (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (22): ReLU(inplace)
      (23): Dropout(p=0.4)
      (24): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (25): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (26): ReLU(inplace)
      (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
      (28): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (29): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (30): ReLU(inplace)
      (31): Dropout(p=0.4)
      (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (33): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (34): ReLU(inplace)
      (35): Dropout(p=0.4)
      (36): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (37): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (38): ReLU(inplace)
      (39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
      (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (42): ReLU(inplace)
      (43): Dropout(p=0.4)
      (44): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (45): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (46): ReLU(inplace)
      (47): Dropout(p=0.4)
      (48): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (49): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (50): ReLU(inplace)
      (51): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    )
    (classifier): Sequential(
      (0): Dropout(p=0.5)
      (1): Linear(in_features=512, out_features=512, bias=True)
      (2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace)
      (4): Dropout(p=0.5)
      (5): Linear(in_features=512, out_features=10, bias=True)
    )
  )
)

Data Augmentations

In this implementation we only use horizontal flips. We pad the images into size 34x34 using reflective padding and then crop the images back into size 32x32. Random cropping is used as an augmentation in the training and then center cropping in the validation phase. Moreover, solt is used for the data augmentations.

In their experiments, Sergey Zagoruyko and Nikos Komodakis seem to have used whitened data. We use here the original data.

YUV color space was proposed to be used by Sergey Zagoruyko. We have run our experimets without the RGB to YUV conversion.

Data is normalized in the usual way with mean and standard deviation calculated across the 50000 images, as it can, e.g., speed up the training.

Setting up the data for training

From PyCharm Terminal

$ python build_dataset.py --dataset CIFAR10

Training

From PyCharm Terminal

$ python run_training.py --dataset_name CIFAR10 --num_classes 10 --experiment vggbndrop --bs 128 --optimizer sgd --lr 0.1 --lr_drop "[160, 260]" --n_epochs 300 --wd 5e-4 --learning_rate_decay 0.2 --n_threads 12 --color_space rgb --set_nesterov True

Results for CIFAR-10

Here we provide the results related to the VGGBNDrop model proposed by Sergey Zagoruyko using SGD as optimizer.

Training and validation

As can be seen from the curves representing loss over time, the model starts to overfit around epoch 164.

From the confusion matrices below related to the validation accuracy curve, we can see how the learning progresses.

Epoch 40:

Epoch 80:

Epoch 120:

Epoch 160:

Evaluation

Evaluation has been run using the model for which the validation loss was the best (see session for details).

Acknowledgements

Aleksei Tiulpin is acknowledged for kindly providing access to his pipeline scripts and giving his permission to reproduce and modify his pipeline for this task.

Research Unit of Medical Imaging, Physics and Technology is acknowledged for making it possible to run the experiments.

Authors

Antti Isosalo, University of Oulu, 2018-

References

Model Architecture

Data Augmentation

Dataset