Skip to content

Pytorch implementation of KFAC and E-KFAC (Natural Gradient).

Notifications You must be signed in to change notification settings

rioyokotalab/KFAC-Pytorch

 
 

Repository files navigation

K-FAC_pytorch

Pytorch implementation of K-FAC and E-KFAC. (Only support single-GPU training, need modifications for multi-GPU.)

Requiresments

pytorch 0.4.0
torchvision
python 3.6.0
tqdm
tensorboardX
tensorflow

How to run

python main.py --dataset cifar10 --optimizer kfac --network vgg16_bn  --epoch 100 --milestone 40,80 --learning_rate 0.01 --damping 0.03 --weight_decay 0.003

Performance

Note: for better hyparameters of K-FAC, please refer to weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)

For K-FAC and E-KFAC, the search range of learning rates, weight decay and dampings are:
(1) learning rate = [3e-2, 1e-2, 3e-3]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]
(3) damping = [3e-2, 1e-3, 3e-3]

For SGD:
(1) learning rate = [3e-1, 1e-1, 3e-2]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]

CIFAR10

Optimizer Model Acc. learning rate weight decay damping
KFAC VGG16_BN 93.86% 0.01 0.003 0.03
E-KFAC VGG16_BN 94.00% 0.003 0.01 0.03
SGD VGG16_BN 94.03% 0.03 0.001 -
KFAC ResNet110 93.59% 0.01 0.003 0.03
E-KFAC ResNet110 93.37% 0.003 0.01 0.03
SGD ResNet110 94.14% 0.03 0.001 -

CIFAR100

Optimizer Model Acc. learning rate weight decay damping
KFAC VGG16_BN 74.09% 0.003 0.01 0.03
E-KFAC VGG16_BN 73.20% 0.01 0.01 0.03
SGD VGG16_BN 74.56% 0.03 0.003 -
KFAC ResNet110 72.71% 0.003 0.01 0.003
E-KFAC ResNet110 72.32% 0.03 0.001 0.03
SGD ResNet110 72.60% 0.1 0.0003 -

Others

Please consider cite the following papers for K-FAC:

@inproceedings{martens2015optimizing,
  title={Optimizing neural networks with kronecker-factored approximate curvature},
  author={Martens, James and Grosse, Roger},
  booktitle={International conference on machine learning},
  pages={2408--2417},
  year={2015}
}

@inproceedings{grosse2016kronecker,
  title={A kronecker-factored approximate fisher matrix for convolution layers},
  author={Grosse, Roger and Martens, James},
  booktitle={International Conference on Machine Learning},
  pages={573--582},
  year={2016}
}

and for E-KFAC:

@inproceedings{george2018fast,
  title={Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis},
  author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
  booktitle={Advances in Neural Information Processing Systems},
  pages={9550--9560},
  year={2018}
}

If you have any questions or suggestions, please feel free to contact me via alecwangcq at gmail , com!

About

Pytorch implementation of KFAC and E-KFAC (Natural Gradient).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%