Image Captioning

Introduction

Build a model to generate captions from images. When given an image, the model is able to describe in English what is in the image. In order to achieve this, our model is comprised of an encoder which is a CNN and a decoder which is an RNN. The CNN encoder is given images for a classification task and its output is fed into the RNN decoder which outputs English sentences.

The model and the tuning of its hyperparamaters are based on ideas presented in the paper Show and Tell: A Neural Image Caption Generator and Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

We use the Microsoft Common Objects in COntext (MS COCO) dataset for this project. It is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms. For instructions on downloading the data, see the Data section below.

Code

The code can be categorized into two groups:

Notebooks - The main code for the project is structured as a series of Jupyter notebooks:

0_Dataset.ipynb - Introduces the dataset and plots some sample images.
1_Preliminaries.ipynb - Loads and pre-processes data and experiments with models.
2_Training.ipynb - Trains a CNN-RNN model.
3_Inference.ipynb- Generates captions for test images.

Helper files - Contain helper code for the notebooks:

data_loader.py- Creates the CoCoDataset and a DataLoader for it.
vocabulary.py - Tokenizes captions and adds them to a dictionary of vocabulary. It is used as an instance variable of the CoCoDataset.
model.py - Provides the CNN and RNN models that are used by the notebooks to train and test data.

Setup

Clone the COCO API repo into this project's directory:

git clone https://github.com/cocodataset/cocoapi.git

Setup COCO API (also described in the readme here):

cd cocoapi/PythonAPI
make
cd ..

Install PyTorch (4.0 recommended) and torchvision.

Linux or Mac:

conda install pytorch torchvision -c pytorch

Windows:

conda install -c peterjc123 pytorch-cpu
pip install torchvision

Others:

Python 3
pycocotools
nltk
numpy
scikit-image
matplotlib
tqdm

Data

Download the following data from the COCO website, and place them, as instructed below, into the cocoapi subdirectory located inside this project's directory (the subdirectory was created when cloning the COCO API repo as shown in the Setup section above):

under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json, captions_val2014.json, instances_train2014.json and instances_val2014.json, and place them in the subdirectory cocoapi/annotations/)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place it in the subdirectory cocoapi/annotations/)
under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place it in the subdirectory cocoapi/images/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place it in the subdirectory cocoapi/images/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place it in the subdirectory cocoapi/images/)

Run

To run any script file, use:

python <script.py>

To run any IPython Notebook, use:

jupyter notebook <notebook_name.ipynb>

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
utils.py		utils.py
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Introduction

Code

Setup

Data

Run

About

Releases

Packages

Languages

License

ntrang086/image_captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Introduction

Code

Setup

Data

Run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages