Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
[Link]
The authors present a novel approach to incorporate language information into extracting visual features by conditioning the Batch Normalization parameters on the language. They apply Conditional Batch Normalization (CBN) to a pre-trained ResNet and show that this significantly improves performance on visual question answering tasks.
This repository is compatible with python 2.
- Follow instructions outlined on PyTorch Homepage for installing PyTorch (Python2).
- The python packages required are
nltk
tqdm
which can be installed using pip.
To download the VQA dataset please use the script 'scripts/vqa_download.sh':
scripts/vqa_download.sh `pwd`/data
Detailed instructions for processing data are provided by GuessWhatGame/vqa.
To create the VQA dictionary, use the script preprocess_data/create_dico.py.
python preprocess_data/create_dictionary.py --data_dir data --year 2014 --dict_file dict.json
To create the GLOVE dictionary, download the original glove file and run the script preprocess_data/create_gloves.py.
wget http://nlp.stanford.edu/data/glove.42B.300d.zip -P data/
unzip data/glove.42B.300d.zip -d data/
python preprocess_data/create_gloves.py --data_dir data --glove_in data/glove.42B.300d.txt --glove_out data/glove_dict.pkl --year 2014
To train the network, set the required parameters in config.json
and run the script main.py.
python main.py --gpu gpu_id --data_dir data --img_dir images --config config.json --exp_dir exp --year 2014
If you find this code useful, please consider citing the original work by authors:
@inproceedings{de2017modulating,
author = {Harm de Vries and Florian Strub and J\'er\'emie Mary and Hugo Larochelle and Olivier Pietquin and Aaron C. Courville},
title = {Modulating early visual processing by language},
booktitle = {Advances in Neural Information Processing Systems 30},
year = {2017}
url = {https://arxiv.org/abs/1707.00683}
}