Machine generate a reasonable caption for input video, using attention-based seq2seq model with LSTM cell
Model Prediction : "a small dog is playing with a ball"python3
tensorflow 1.0
Download hw2 data from kaggle, and GloVe 300 dim
./Caption-Generation/MLDS_hw2_data/*
./Caption-Generation/MLDS_hw2_data/glove/glove.6B.300d.txt
First time use, you need to do the preprocessing
$ python3 caption_gen.py --prepro 1
If you already have done the preprocessing
$ python3 caption_gen.py --prepro 0
There are three different models available.
- CaptionGeneratorBasic
- greedy inference
- CaptionGeneratorMyBasic
- beam search
- greedy inference
- CaptionGeneratorSS
- schedule sampling
- beam search
- greedy search
You can set model_type to new different model. e.g.
$ python3 caption_gen.py --prepro [1/0] --model_type=CaptionGeneratorSS
This code provide two inference methods, Greedy Search and Beam Search
beam search inference is not available in CaptionGeneratorBasic model.
(default beam search @k is set to 5)