Skip to content

TensorFlow/Keras implementation of the paper: 'Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates'

Notifications You must be signed in to change notification settings

coxy1989/superconv

Repository files navigation

Superconv

CircleCI

This repository provides an implementation of the 1cycle learning rate policy as originally described in the paper: Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates [1]. In addition, it includes a reproduction of the published results on MNIST and new experiments on CIFAR10.

What's in the box?

  • Implementation of the 1cycle learning rate policy.
  • Port of the LeNet model which ships with Caffe to keras.
  • Implementation of another simple 3 layer net.
  • Experiments which reproduce the published result on MNIST and new experiments on CIFAR10.

The experiments performed in this repository were conducted on an Ubuntu 18.04 paperspace instance with a Nvidia Quadro P4000 GPU, NVIDIA Driver: 410.48, CUDA 10.0.130-1.

Quickstart

  1. git clone git@github.com:coxy1989/superconv.git

  2. cd superconv

  3. conda env create -f environment.yml

  4. source activate superconv

  5. jupyter notebook

If you'd like to run the CIFAR10 experiments you can download the tfrecord files used in training from my website by running the get_data.sh script in the /data folder

Notebooks

Results

MNIST

The result below confirms that superconvergence can be observed with a standard configuration and the simple LeNet network architecture.

LR/SS/PL CM/SS Epochs Accuracy (%)
0.01/inv 0.9 85 98.92
0.01/rop 0.9 85 98.85
0.01-0.1/5 0.95-0.8/5 12 99.05
0.01-0.1/12 0.95-0.8/12 25 99.01
0.01-0.1/23 0.95-0.8/23 50 99.02
0.02-0.2/40 0.95-0.8/40 85 99.07

Table 1: Final accuracy for the MNIST dataset using the LeNet architecture with weight decay of 0.0005 and batch size of 512. Reported final accuracy is an average of 5 runs. LR = learning rate, SS = stepsize in epochs, where two steps comprise a cycle. CM = cyclical momentum, 'inv' is the inv caffe policy, 'rop' is the reduce on plateau keras policy.

Plot 1: Accuracy vs epoch for the CLR(12), CLR(85), INV and ROP results in the preceeding table.

CIFAR10

Results on CIFAR10 were not included in the original paper. The result below demonstrates superconvergence is not observed with a standard configuration and simple 3 layer network. I suspect that tuning of the other hyperparamaters is required, since it can be demonstrated that rapid convergence on this dataset is achievable with the related CLR policy and a similar network architecture. More experimentation is required here, feel free to send a pull request if you perform further experiments.

LR/SS/PL CM/SS Epochs Accuracy (%)
0.01/inv 0.9 85 79.00
0.01/rop 0.9 85 80.11
0.01-0.1/5 0.95-0.8/5 12 78.65
0.01-0.1/12 0.95-0.8/12 25 78.38
0.01-0.1/23 0.95-0.8/23 50 78.15
0.02-0.2/40 0.95-0.8/40 85 78.05

Table 2: Final accuracy on the CIFAR10 dataset with a simple 3 layer architecture, weight decay of 0.003 and batch size of 128. Reported final accuracy is an average of 5 runs. LR = learning rate, SS = stepsize in epochs, where two steps comprise a cycle. CM = cyclical momentum, 'inv' is the inv caffe policy, 'rop' is the reduce on plateau keras policy.

Plot 2: Accuracy vs epoch for the CLR(12), CLR(85), INV and ROP results in the preceeding table.

References

[1] Leslie N. Smith. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. arXiv:1708.07120, 2017.

About

TensorFlow/Keras implementation of the paper: 'Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages