Pytorch implementation of various Knowledge Distillation (KD) methods.
Name | Method | Paper Link | Code Link |
---|---|---|---|
Baseline | basic model with softmax loss | — | code |
ST | soft target | paper | code |
AT | attention transfer | paper | code |
Fitnet | hints for thin deep nets | paper | code |
NST | neural selective transfer | paper | code |
FT | factor transfer | paper | code |
RKD | relational knowledge distillation | paper | code |
- Note, there are some differences between this repository and the original papers:
- For
AT
: I use the sum of absolute values with power p=2 as the attention. - For
Fitnet
: The training procedure is one stage without hint layer. - For
NST
: I employ polynomial kernel with d=2 and c=0.
- For
- CIFAR10
- CIFAR100
- Resnet-20
- Resnet-110
- Create
./dataset
directory and download CIFAR10/CIFAR100 in it. - You can simply specify the hyper-parameters listed in
train_xxx.py
or manually change them.- Use
train_base.py
to train the teacher model in KD and then save the model. - Before traning, you can choose the method you need in
./kd_losses
directory, and runtrain_kd.py
to train the student model.
- Use
- python 3.7
- pytorch 1.3.1
- torchvision 0.4.2
This repo is partly based on the following repos, thank the authors a lot.
- HobbitLong/RepDistiller
- bhheo/BSS_distillation
- clovaai/overhaul-distillation
- passalis/probabilistic_kt
- lenscloth/RKD
- [AberHu/Knowledge-Distillation-Zoo](https://github.com/AberHu/Knowledge-Distillation-Zoo)
If you employ the listed KD methods in your research, please cite the corresponding papers.