Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Accepted by CVPR 2023. More Info: [ Paper | Slide | Youtube | Poster | Github ]

This is supposed to be an unofficial release of miniature code to reveal the core implementation of our attention block. The final adopted attention block is in a MultiScaleAttention format.

python attention.py

Here are some general guidances for reproducing results reported in our paper.

For classification task, we build our codebase on top of MobileVision@Meta.
For segmentation task, we build our codebase on top of Mask2Former, where the unsupervised pretrained models are trained using the MAE framework.
For detection task, we build our codebase on top of PicoDet@PaddleDet and its PyTorch version. The supervised pretrained models are trained using the LeViT framework.

To facilitate the usage in our research community, I am working on translating some of the highly coupled codes to standalone version. Ideally, the detection codebase can be exptected later, stay tuned.

Citation

If you find this codebase is useful for your research, please cite:

@inproceedings{you2023castling,
  title={Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference},
  author={You, Haoran and Xiong, Yunyang and Dai, Xiaoliang and Wu, Bichen and Zhang, Peizhao and Fan, Haoqi and Vajda, Peter and Lin, Yingyan},
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Citation