Skip to content

This repository contains a PyTorch implementation of the Vision Transformer (ViT), inspired by the seminal paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". The project builds a Vision Transformer model from scratch, processes images into patches, and trains the model on standard image datasets.

Notifications You must be signed in to change notification settings

anupj/Vision-Transformer-Implementation-in-PyTorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Vision Transformer Implementation in PyTorch

This project provides an implementation of the Vision Transformer (ViT) model using PyTorch. The Vision Transformer is a novel architecture that applies transformer models, traditionally used in NLP, to image classification tasks.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This repository contains a PyTorch implementation of the Vision Transformer (ViT), inspired by the seminal paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". The project builds a Vision Transformer model from scratch, processes images into patches, and trains the model on standard image datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published