This repository contains PyTorch implementation for Sparsifiner (CVPR 2023).
[Project Page] [arXiv (CVPR 2023)]
- torch>=1.8.1
- torchvision>=0.9.1
- timm==0.3.2
- tensorboardX
- six
- fvcore
Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be
│ILSVRC2012/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Model preparation: download pre-trained models if necessary:
model | url | model | url |
---|---|---|---|
DeiT-Small | link | LVViT-S | link |
DeiT-Base | link | LVViT-M | link |
To train a Sparsifiner model with default configuration on ImageNet, run:
Sparsifiner-S
Train on 8 GPUs
bash run_model.sh --IMNET sparsifiner_default 8
MIT License
Our code is based on DynamicVit, pytorch-image-models, DeiT, LV-ViT
If you find our work useful in your research, please consider citing:
@InProceedings{Wei_2023_CVPR,
author = {Wei, Cong and Duke, Brendan and Jiang, Ruowei and Aarabi, Parham and Taylor, Graham W. and Shkurti, Florian},
title = {Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {22680-22689}
}