After pre-processing, data is organized as follows in output_dir:
sentences.txt # raw captions
sentences.token # tokenized captions # normalized captions
sentences.chk # chunk tags (output from SENNA)
sentences.chksz # reformatted chunk tags (one caption per line)
sentences.chkszfinal # normalized chunk tags
nb.txt # number of captions per image
id.txt # image ids
NP.txt # noun phrase vocabulary
VP.txt # verbal phrase vocabulary
PP.txt # prepositional phrase vocabulary
image_NP_ge10.txt # noun phrases from image captions
image_NP_ge10.index # same as above, but with indices (for training purpose)
image_VP_ge10.txt # verbal phrases from image captions
image_VP_ge10.index # same as above, but with indices (for training purpose)
image_PP_ge10.txt # prepositional phrases from image captions
image_PP_ge10.index # same as above, but with indices (for training purpose)
image_START-NP_ge10.txt # noun phrases from starting image captions
image_NP_ge10.txt # noun phrases from starting image captions
image_START-NP_ge10.txt # starting noun phrases from image captions
image_NP-PP_ge10.txt # noun-prepositional pair phrases from image captions
image_NP-VP_ge10.txt # noun-verbal pair phrases from image captions
image_PP-NP_ge10.txt # prepositional-noun pair phrases from image captions
image_VP-NP_ge10.txt # verbal-noun pair phrases from image captions
image_NP-PERIOD_ge10.txt # ending noun phrases from image captions
NP_given_START_ge10.txt # noun phrases probabilities for starting captions
PP_given_NP_ge10.txt # prepositional phrases probabilities given a noun phrase
VP_given_NP_ge10.txt # verbal phrases probabilities given a noun phrase
NP_given_VP_ge10.txt # noun phrases probabilities given a verbal phrase
NP_given_PP_ge10.txt # noun phrases probabilities given a prepositional phrase
CHK_given_NP_ge10.txt # type of phrases probabilities given noun phrase type
NP_given_NP-PP_ge10.txt # noun phrases probabilities given a noun-prepositional pair phrases
NP_given_NP-VP_ge10.txt # noun phrases probabilities given a noun-verbal pair phrases
VP_given_PP-NP_ge10.txt # verbal phrases probabilities given a prepositional-noun pair phrases
VP_given_VP-NP_ge10.txt # verbal phrases probabilities given a noun-prepositional pair phrases
PP_given_PP-NP_ge10.txt # prepositional phrases probabilities given a prepositional-noun pair phrases
PP_given_VP-NP_ge10.txt # prepositional phrases probabilities given a verbal-noun pair phrases
features.bin # 2D torch.FloatTensor object with image features
id.txt # image ids
NP_400d_ge10.bin # 400d noun phrase embeddings (2D torch.FloatTensor)
VP_400d_ge10.bin # 400d verbal phrase embeddings (2D torch.FloatTensor)
PP_400d_ge10.bin # 400d prepositional phrase embeddings (2D torch.FloatTensor)
In this example, we consider only phrases that appear at least 10 times in the training dataset.
This step assumes that each input line contains the image id in first column, then image captions in the next columns. Each column is separated by tabulation character.
lua chunking/parse.lua "$input_file" "$output_dir"
With Stanford tokenizer.
java -cp third_party/stanford-parser.jar \
edu.stanford.nlp.process.PTBTokenizer \
-preserveLines "$output_dir/sentences.txt" \
> "$output_dir/sentences.token"
Preparing text data for chunking with SENNA.
lua chunking/normalize_token.lua "$output_dir"
cd third_party/senna
senna \
-usrtokens \
-chk \
-notokentags \
-brackettags \
< "$output_dir/" \
> "$output_dir/sentences.chk"
cd ../..
lua chunking/chunksize.lua "$output_dir"
Merging some phrases into longer verbal phrases or prepositional phrases.
lua chunking/normalize_phrases.lua "$output_dir"
mkdir vocab
lua phrases/get_phrases.lua "NP" "$output_dir"
lua phrases/get_phrases.lua "VP" "$output_dir"
lua phrases/get_phrases.lua "PP" "$output_dir"
Here a threshold can be set to remove rare phrases, and reduce the vocabulary size at the same time.
lua phrases/image_phrases.lua "NP" "$output_dir" 10
lua phrases/image_phrases.lua "VP" "$output_dir" 10
lua phrases/image_phrases.lua "PP" "$output_dir" 10
cd lm
lua phrases_pairs.lua "$output_dir" 10
cd ..
cd lm
th transition-proba.lua "$output_dir" 10
cd ..
cd lm
lua vocab-biphrase.lua "$output_dir" 10
cd ..
cd lm
th transition-proba-biphrase.lua "$output_dir" 10 NP-PP NP
th transition-proba-biphrase.lua "$output_dir" 10 NP-VP NP
th transition-proba-biphrase.lua "$output_dir" 10 PP-NP VP
th transition-proba-biphrase.lua "$output_dir" 10 VP-NP VP
th transition-proba-biphrase.lua "$output_dir" 10 PP-NP PP
th transition-proba-biphrase.lua "$output_dir" 10 VP-NP PP
cd ..
This step assumes that word embeddings have been computed beforehand. Word embeddings are stored in binary file containing a 2D torch.FloatTensor. A plain text file contains the word vocabulary (one word per line).
for p in {"NP","VP","PP"}
th phrases/embeddings.lua \
-emb emb_file.bin \
-vocab word_file.txt \
-pfsz 400 \
-t 10 \
-phr $p
Training with negative sampling.
cd bilinear
th train.lua \
-data "$output_dir" \
-out "$exp_dir" \
-pfsz 400 \
-neg 15 \
-lr 0.025 \
-nbiter 100 \
-epoch 100000 \
cd ..
Given a directory containing images, it predicts the top phrases for each images. For displaying images, users need to install the following packages: qtlua, image.
cd bilinear
qlua infer.lua \
-model "$exp_dir" \
-img "$img_dir" \
cd ..