Skip to content

Code for the paper "Learning Natural Language Generation with Truncated Reinforcement Learning"

Notifications You must be signed in to change notification settings

AMDonati/RL-NLP

Repository files navigation

RL-NLP

Code for the paper Learning Natural Language Generation with Truncated Reinforcement Learning accepted at Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

This repo is using CLEVR and VQAv2 Datasets:

Get data

  • Get data
    gdown --id 1AVZXRzmKBxVH6Ul9ZviSWPVdj_kwU3yX --output data/data.zip
    unzip data/data.zip -d data/
    rm data/data.zip
    rm data/vqa-v2/cache/vocab.json
    
    
  • If you want the whole clevr dataset:
    rm -r data/CLEVR_v1.0
    wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip -O data/CLEVR_v1.0.zip
    unzip data/CLEVR_v1.0.zip -d data
    rm data/CLEVR_v1.0.zip
    
    
  • If you want the whole vqa-v2 dataset
    wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb
    mv data.mdb data/vqa-v2/coco_trainval.lmdb/
    
    
  • To download the clevr dialog on 20000 images to train the external language model
    gdown --id 1BSqXY6KV4wOxo6tdjP7xej54gMvqk7k1 --output data/clevr_ext/clevr_dialog_train_raw.json
    
    
  • To get the necessary models:
    chmod +x sh/files/get_models.sh
    sh/files/get_models.sh
    
    

Requirements

  • You can create a conda environment called rl-nlp: conda create -n rl-nlp
  • And activate it: conda activate rl-nlp
  • The required library can be installed via the file requirements.txt: pip install -r requirements.txt
  • The code relies on the CLOSURE github: you need to install it with: python -m pip install git+https://github.com/gqkc/CLOSURE.git --upgrade
  • And on the VILBERT multi-task github: python -m pip install git+https://github.com/gqkc/vilbert-multi-task.git --upgrade

File architecture

RL-NLP
├── config         # store the configuration file to create/train models
|  
|
├── output         # store the output experiments (pre-trained models, logs...)
|   ├── lm_model / model.pt : path for pre-trained model .pt on CLEVR dataset. 
|   ├── SL_LSTM_32_64 / model.pt: path for the pre-trained policy .pt on CLEVR dataset. 
|   ├── SL_LSTM_32_64_VQA / model.pt: path for the pre-trained policy conditioned on the answer (for VQA reward) on the CLEVR dataset. 
|   └── vqa_model_film / model.pt: path for the pre-trained oracle model for the "vqa" reward of the CLEVR dataset
    └── lm_model_vqa / model.pt: path for pre-trained lm model
    └── vqa_policy_512_1024_answer / model.pt
    └── vilbert_vqav2
         ├── model.bin : path for vilbert oracle fine-tuned on vqav2 task
         ├── bert_base_6layer_6conect.json : config file for vilbert oracle
|
├── data          
|   └── CLEVR1.0  # root folder for the CLEVR dataset.
    └── vqa-v2 # root folder for the VQA-V2 dataset.
         ├── coco_trainval.lmdb # lmdb folder for the image features (reduced one on local machine, complete one on VM). 
         ├── cache 
              ├── vocab.json: path for vocab. 
    └── closure_vocab.json: vocab path for closure dataset (used on the "vqa reward" of CLEVR). 
    └── vocab.json: vocab path for the CLEVR dataset. 
    └── train_questions.h5: h5 file for training question dataset. 
    └── val_questions.h5: h5 file for validation question dataset. 
    └── test_questions.h5: h5 file for test question dataset. 
    └── train_features.h5: h5 file for training images features. 
    └── val_features.h5: h5 file for validation images features. 
    └── test_features.h5: h5 file for test images features. 
    
|
└── src            # source files

Data preprocessing

CLEVR

  • To run all the scripts from the origin repo (RL-NLP), run first the following command line: export PYTHONPATH=src:${PYTHONPATH}

Preprocessing the dataset questions

To preprocess the questions of the three datasets, run the scripts src/sh/preprocess_questions or the 3 following command lines (in this order):

  • python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_train_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/train_questions.h5" -min_token_count 1

  • python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_val_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/val_questions.h5" -min_token_count 1

  • python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_test_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/test_questions.h5" -min_token_count 1

Extracting the image features

To extract the image features, run the script src/sh/extract_features.py or the 3 following command lines (batch size arg must be tuned depending on memory availability):

  • python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/train \ --output_h5_file data/train_features.h5 --batch_size 128

  • python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/val \ --output_h5_file data/val_features.h5 --batch_size 128

  • python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/test \ --output_h5_file data/test_features.h5 --batch_size 128

VQA

First, extract the vocab:

Extracting full vocab
  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "none" -min_split 0

This creates a file "vocab.json" with the vocab.

Extract a reduced vocab (on smaller train and val datasets)
  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "none" -min_split 1

This creates a file "vocab_min.json" with the vocab.

Secondly, get the preprocessed pkl file for each dataset

Full datasets (with vocab.json)
  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "train" -min_split 0 -test 1

This will create a file "train_entries.pkl"

  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "val" -min_split 0 -test 1

This will create a file "val_entries.pkl"

  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "train" -min_split 1 -test 0

This will create a file "mintrain_entries.pkl"

  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "val" -min_split 1 -test 0

This will create a file "minval_entries.pkl"

Reduced datasets on reduced vocab (train_dataset = 20,000 questions & val_dataset = 5,000 questions)
  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab_min.json" -split "train" -min_split 1 -test 0

This will create a file "mintrain_minvocab_entries.pkl".

  • python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab_min.json" -split "val" -min_split 1 -test 0

This will create a file "minval_minvocab_entries.pkl".

if you want to use vilbert:

cd ..
git clone https://github.com/gqkc/vilbert-multi-task.git
cd vilbert-multi-task/tools/
git clone -b python3 https://github.com/lichengunc/refer.git
cd refer
make
# if problem, do instead: 
python setup.py install
cd ../../
python -m pip install -e . 

#if problem with python : 
python -m pip install --upgrade cython 

Training the models

Link to the pre-trained models

CLEVR

  1. Language Model .pt file here.
  2. Levenshtein Task:
  • Pretrained Policy .pt file (word_emb_size = 32, hidden_size = 64) here.
  1. VQA task:
  • Pretrained VQA model (FiLM version here.
  • Pretrained Policy here

VQAV2

  1. VQA task:
  • Pretrained VQA VILBERT model here.
  • Pretrained VQA VILBERT config file here.

Training the Language Model on the Dataset of Questions

CLEVR

python src/train/launch_train.py -task "lm" -dataset "clevr" -model "lstm" -num_layers 1 -emb_size 512 -hidden_size 512 -p_drop 0.1 -lr 0.001 -data_path "data" -out_path "output" -bs 512 -ep 20 -num_workers 6

VQA

python src/train/launch_train.py -task "lm" -dataset "vqa" -model "lstm" -num_layers 1 -emb_size 512 -hidden_size 512 -p_drop 0.1 -lr 0.001 -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -out_path "output" -bs 512 -ep 50 -num_workers 6

Pre-training of the Policy with Supervised Learning

CLEVR

No answer conditioning

python src/train/launch_train.py -task "policy" -dataset "clevr" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -max_samples 21 -fusion "cat"
N.B: When training only on a CPU, the max_samples args is required to train only on a subset of the dataset.

w/ answer conditioning

python src/train/launch_train.py -task "policy" -dataset "clevr" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -max_samples 21 -fusion "cat" -condition_answer "after_fusion"

VQA

No answer conditioning

python src/train/launch_train.py -task "policy" -dataset "vqa" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -fusion "average"

w/ answer conditioning

python src/train/launch_train.py -task "policy" -dataset "vqa" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -fusion "average" -condition_answer "after_fusion"

Training the RL Agent

  • See examples in src/scripts/sh.
  • The folder "debug" allows to run small experiments on each of the algo for the 2 CLEVR tasks (Levenshtein & VQA rewards).

logging on tensorboard to display results:

  • cd output/2000_img_len_20"
  • tensorboard --logdir=experiments/train

With GCP VM (VM Instance name here = alice_martindonati@pytorch-3-vm), on local machine:

  • gcloud compute ssh alice_martindonati@pytorch-3-vm -- -NfL 6006:localhost:6006

About

Code for the paper "Learning Natural Language Generation with Truncated Reinforcement Learning"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published