This repository contains a PyTorch implementation of the PMLR 2015 Paper, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. For the original implementation of the paper, please refer to the author's implementation in Theano language here.
For further details on the Data Pre-Processing and Methodology, please refer to Report.pdf
Make sure you have a Python3+ version. Run the following command -
pip install -r requirements.txt
Download the dataset from here and unzip the downloaded dataset. This will create a ./data/
in the root folder of the repository.
python3 train.py [--base_dir (str)] [--debug (bool)] [--lr (str)] [--alpha_c (float)] [--log_interval (int)] [--epochs (int)] [--batch_size (int)] [--result_dir (str)] [--init_model (str)]
Options :
--base_dir Path of directory of data/ folder
--debug Debug : If set to True, then will print debug messages and will run only for one epoch with batch size one.
--lr Learning Rate of Adam optimiser.
--alpha_c Regularisation Constant.
--log_interval Number of batches after you want to print the loss for one epoch.
--epochs No of epochs you want to train model.
--batch_size Batch Size to be used.
--result_dir Path of the directory where you want to create results/ folder which will save the trained models.
--init_model Path of the model with which you want to initialise the model before training.
Example Usage :
python3 train.py --base_dir=/kaggle/input/ass4q1/Data/ --result_dir=/kaggle/working/results --init_model=/kaggle/input/show-33/33.pth
After training the model for 42 epochs, we got the following scores :
Data | BLEU1 | BLEU2 | BLEU3 | BLEU4 | METEOR |
---|---|---|---|---|---|
Validation Data | 0.5560103298543799 | 0.3012887274731861 | 0.16869948803000315 | 0.09124722446388864 | 0.24042306161781762 |
Testing Data | 0.5756577733000581 | 0.3240567011948614 | 0.18865787383495358 | 0.10922486278026644 | 0.25408193751079544 |
Soft Attention for the generated images |
---|
Download the trained model from here
To finally test your trained model run the following command
python3 inferences.py [--base_dir (str)] [--model (str)] [--result_dir (int)]
Options :
--model Path of trained model
--base_dir Path of directory of data/ folder
--result_dir Path of the directory where you want to save the generated captions
Copyright (c) 2021 Paras Mehan
For license information, see LICENSE or http://mit-license.org
Done by Pragyan Mehrotra, Vrinda Narayan and Paras Mehan
This code was written as a part of a course group assignment in Deep Learning with Dr. Md. Shad Akhtar at IIIT Delhi during Winter 2021 Semester.
For bugs in the code, please write to: paras18062 [at] iiitd [dot] ac [dot] in or create an issue in the repository