Skip to content

Image Captioning in Chinese using LSTM RNN with attention mechanism

Notifications You must be signed in to change notification settings

cai-lw/image-captioning-chinese

Repository files navigation

Image Captioning in Chinese

A course project of Pattern Recognition at Tsinghua University in the spring semester of 2017.

Implemented two RNN-based image captioning models from two corresponding papers:

  • "Show and Tell", simple LSTM RNN: Vinyals, Oriol, et al. "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge."
  • "Show, Attend and Tell", LSTM RNN with attention: Xu, K., et al. "Show, attend and tell: Neural image caption generation with visual attention."

Dependencies

  • tensorflow 1.1
  • tensorlayer 1.4.3
  • jieba 0.38
  • h5py 2.7.0

Dataset

Images are from MS COCO. To save time from running huge CNNs, they are provided as feature vectors from a pre-trained CNN. To prevent cheating (manual solving), only a small fraction of the original images are provided.

Captions are labeled by students in the course, so they may not be high-quality.

The dataset can be downloaded at Google Drive or 百度网盘.

Usage

Download METEOR and put it in directory meteor-1.5, and run make_val_meteor.py to produce METEOR-compatible validation data.

lstm.py is the "Show and Tell" model,and lstm_attention.py is the "Show, Attend and Tell" model. Both models have many configurable hyperparameters. Run them with --help argument to learn more.

About

Image Captioning in Chinese using LSTM RNN with attention mechanism

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published