Use hierarchical attention from different feature layer to generate image captioning
This is the code for the final project of 10-807 at CMU
You need to have tensorflow installed.
This is the code for the image captioning part
(1) Download MS COCO data, download VGG16 pretrained model [https://www.cs.toronto.edu/~frossard/post/vgg16/]
(2) Resize images to 224 x 224 [resize.py]
(3) Preprocess annotions and extract features [preprocess.py]
(4) Train your model [train.py]
(5) Test your model [test.py]
For evaluation, use pycocoevalcap [git clone https://github.com/tylin/coco-caption.git]