This project provides an implementation of the Vision Transformer (ViT) model using PyTorch. The Vision Transformer is a novel architecture that applies transformer models, traditionally used in NLP, to image classification tasks.
This project is licensed under the MIT License. See the LICENSE file for details.