Skip to content

Latest commit

 

History

History
50 lines (40 loc) · 2.84 KB

README.md

File metadata and controls

50 lines (40 loc) · 2.84 KB

Knowledge-Distillation

Pytorch implementation of various Knowledge Distillation (KD) methods.

Lists

Name Method Paper Link Code Link
Baseline basic model with softmax loss code
ST soft target paper code
AT attention transfer paper code
Fitnet hints for thin deep nets paper code
NST neural selective transfer paper code
FT factor transfer paper code
RKD relational knowledge distillation paper code
  • Note, there are some differences between this repository and the original papers:
    • For AT: I use the sum of absolute values with power p=2 as the attention.
    • For Fitnet: The training procedure is one stage without hint layer.
    • For NST: I employ polynomial kernel with d=2 and c=0.

Datasets

  • CIFAR10
  • CIFAR100

Networks

  • Resnet-20
  • Resnet-110

Training

  • Create ./dataset directory and download CIFAR10/CIFAR100 in it.
  • You can simply specify the hyper-parameters listed in train_xxx.py or manually change them.
    • Use train_base.py to train the teacher model in KD and then save the model.
    • Before traning, you can choose the method you need in ./kd_losses directory, and run train_kd.py to train the student model.

Requirements

  • python 3.7
  • pytorch 1.3.1
  • torchvision 0.4.2

Acknowledgements

This repo is partly based on the following repos, thank the authors a lot.

If you employ the listed KD methods in your research, please cite the corresponding papers.