Skip to content

Image captioning model using CNN-LSTM, CNN-Transformer, and Vision Transformer (ViT) architecture

Notifications You must be signed in to change notification settings

kennykguo/image-captioning

Repository files navigation

Image Captioning

This repository contains my implementation of an image captioning model. The model takes an image as input and generates a descriptive English caption.

Project Overview

  • I used several different model architectures such as CNN-LSTMs and CNN-Transformers.
  • The project involves the MSCOCO2017 dataset. I initially used the Flickr30k, but I found that my captioning results were much better on MSCOCO2017, most likely because it has more data.

Results

This model achieved a maximum BLEU-4 caption score of 11.0

download

Original Caption: polar bear swimming in the water by wall

Generated Caption: polar bear swimming by large wave

Inspiration

This project was inspired by the following papers:

About

Image captioning model using CNN-LSTM, CNN-Transformer, and Vision Transformer (ViT) architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published