By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang Pu.
This repository is the official implementation of "Unified Normalization for Accelerating and Stabilizing Transformers".
Download this repo by
git clone https://github.com/hikvision-research/Unified-Normalization
We develop our method on IWSLT14De2En with the framework of fairseq.
fairseq == 1.0.0
Python >= 3.6.5
Pytorch >= 1.6.0
Follow the next steps to install fairseq locally:
cd Unified-Normalization/neural_machine_translation
pip install -r requirements.txt
pip setup.py install
Please follow the script for preprocessing as prepare_iwslt14_de2en.sh
bash data_preprocessing/machine_translation/prepare_iwslt14_de2en.sh
Once all commands are finished sccessfully, the processed data should be found at ./neural_machine_translation/data/iwslt14de2en/data-bin/iwslt14.tokenized.de-en/
To averaging last 10 checkpoints:
save_dir=path/to/saved/ckpt
num=10
python scripts/average_checkpoints.py \
--inputs $save_dir \
--num-epoch-checkpoints $num \
--output $save_dir/checkpoint_avg_$num.pt
Then, evaluate the model on single GPU as:
python -W ignore fairseq_cli/generate.py \
data/iwslt14de2en/data-bin/iwslt14.tokenized.de-en/ \
--path $save_dir/checkpoint_avg_$num.pt \
--batch-size 256 \
--beam 5 \
--remove-bpe \
--quiet
You could find the evaluation script at run_iwslt14_eval.sh.
We trained our models on single GPU (V100). Please refer to run_iwslt14_train_un.sh for training Transformer with UN1d and run_iwslt14_train.sh for LayerNorm (baseline).
Method | Offline | IWSLT14 (BELU) |
---|---|---|
LN | 35.3 | |
BN | 31.1 | |
UN (ours) | 35.4 (checkpoint, log) |
timm==0.3.4
torch>=1.4.0
torchvision>=0.5.0
pyyaml
We conduct image classification on ImageNet/CIFAR10/CIFAR100. Please download these datasets from the corresponding hyperlinks.
For ImageNet, all samples are structured as below:
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
On ImageNet:
cd ./image_classification
ckpt_file=path/to/ckpt
data_dir=path/to/imagenet
CUDA_VISIBLE_DEVICES=0 python main.py \
$data_dir \
--model T2t_vit_14_all_un1d \
--img-size 224 \
-b 64 \
--eval_checkpoint $ckpt_file
On CIFAR10:
cd ./image_classification
ckpt_file=path/to/ckpt
data_dir=path/to/cifar10
CUDA_VISIBLE_DEVICES=0 python transfer_learning.py \
--dataset 'cifar10' \
--data-dir $data_dir \
--b 128 \
--num-classes 10 \
--img-size 224 \
--model T2t_vit_14_all_un1d \
--eval \
--transfer-model $ckpt_file \
On CIFAR100:
cd ./image_classification
ckpt_file=path/to/ckpt
data_dir=path/to/cifar10
CUDA_VISIBLE_DEVICES=0 python transfer_learning.py \
--dataset 'cifar100' \
--data-dir $data_dir \
--b 128 \
--num-classes 100 \
--img-size 224 \
--model T2t_vit_14_all_un1d \
--eval \
--transfer-model $ckpt_file \
cd ./image_classification
# training on 8 gpus (V100)
save_dir=path/to/saved/ckpt
data_dir=path/to/imagenet
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash distributed_train.sh 8 \
$data_dir \
--model T2t_vit_14_all_un1d \
--batch-size 64 \
--lr 5e-4 \
--weight-decay 0.05 \
--img-size 224 \
--output $save_dir
Please follow run_finetune.sh to finetune models on CIFAR10/100 with single GPU.
Tips: DO NOT forget to reset the parameter 'iters' (that counts iteration) in UN with zero after loading pretrained weights, then UN can enjoy the warming-up phase during down-stream training/finetuning.
Method | Offline | ImageNet (Top1) | CIFAR10 (Top1) | CIFAR100 (Top1) |
---|---|---|---|---|
LN | 81.5% | 98.3% | 88.4% | |
BN | 79.8% | 96.6% | 88.2% | |
UN (ours) | 80.9% (checkpoint, log) | 98.3% (checkpoint) | 88.9% (checkpoint) |
If you find this repo is useful, please consider citing our paper:
@inproceedings{yang2022unified,
title={Unified Normalization for Accelerating and Stabilizing Transformers},
author={Yang, Qiming and Zhang, Kai and Lan, Chaoxiang and Yang, Zhi and Li, Zheyang and Tan, Wenming and Xiao, Jun and Pu, Shiliang},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={4445--4455},
year={2022}
}
Unified Normalization is released under the Apache 2.0 license. Other codes from open-source repositories follow the original distributive licenses.
We would like to thank the teams of fairseq and T2T-ViT for developing these easy-to-use works.