GitHub - EliasK93/BEiT-image-transformer-for-food-classification: Example application for training Microsofts's pretrained BEiT image transformer model on a new image classification task

BEiT Image Transformer for Food Classification

Application for fine-tuning Microsoft's BEiT model (Bidirectional Encoder representation from Image Transformers) on an image classification dataset (Food 101).

Models

BEiT is a family of Image Transformers that during its pre-training first tokenizes images into discrete visual tokens, randomly corrupts these tokens using blockwise masking and then learns to predict the best visual tokens to fill the gaps. In this sense, it utilizes a 'BERT-like' approach for its pre-training.


BEiT pre-training procedure (image taken from the original paper)

Three variants of the model are fine-tuned and evaluated:

model_id	pre-trained on	further fine-tuned on
microsoft/beit-base-patch16-224-pt22k	ImageNet-21k (14m images, 21k classes, at resolution 224x224)	-
microsoft/beit-base-patch16-224	ImageNet-21k (14m images, 21k classes, at resolution 224x224)	ImageNet 2012 (1m images, 1k classes, at resolution 224x224)
microsoft/beit-base-patch16-224-pt22k-ft22k	ImageNet-21k (14m images, 21k classes, at resolution 224x224)	ImageNet-21k (14m images, 21k classes, at resolution 224x224)

In all pre-trainings, the images were handed to the model in patches of size 16x16. Each model has a size of about 86m parameters.

Corpus

The Food 101 corpus contains 101.000 images of 101 types of food originally posted on the foodspotting.com platform. Some examples are shown below.

Type of food	Image
Churros
Falafel
Sushi
Lasagna

The corpus was split into 80% train, 10% validation and 10% test set. Each model variant was fine-tuned for three epochs on the data.

Results

Model	Accuracy
beit-base-patch16-224-pt22k	0.629
beit-base-patch16-224	0.825
beit-base-patch16-224-pt22k-ft22k	0.811

Requirements

- Python >= 3.10

- Conda

pytorch==2.4.0
cudatoolkit=12.1

- pip

transformers
datasets
openpyxl
scikit-learn

Notes

The dataset image files are not included, they can be downloaded from this Kaggle URL. The trained model files are omitted in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
image_classification		image_classification
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BEiT Image Transformer for Food Classification

Models

Corpus

Results

Requirements

- Python >= 3.10

- Conda

- pip

Notes

About

Releases

Packages

Languages

EliasK93/BEiT-image-transformer-for-food-classification

Folders and files

Latest commit

History

Repository files navigation

BEiT Image Transformer for Food Classification

Models

Corpus

Results

Requirements

- Python >= 3.10

- Conda

- pip

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages