Skip to content

Visual Question Answering in PyTorch with various Attention Models

Notifications You must be signed in to change notification settings

SatyamGaba/visual_question_answering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

visual_question_answering

Pytorch implementation of the following papers

model

Directory and File Structure

.
+-- COCO-2015/
|   +-- images/ (link of /dataset/COCO2015 from server (using ln -s))
|       +-- train2014/
|       +-- ...
|   +-- resized_images/
|       +-- train2014/
|       +-- ...
|       +-- Questions/
|       +-- Annotations/
|       +-- train.npy
|       +-- ...
|       +-- vocab_questions.txt
|       +-- vocab_answers.txt
|   +-- <questions>.json
|   +-- <annotations>.json
+-- vqa
|   +-- .git
|   +-- README.md

Usage

1. Clone the repositories.

$ git clone https://github.com/SatyamGaba/visual_question_answering.git

2. Download and unzip the dataset from official url of VQA: https://visualqa.org/download.html.

We have used VQA2 in for this project

$ cd visual_question_answering/utils
$ chmod +x download_and_unzip_datasets.csh
$ ./download_and_unzip_datasets.csh

3. Preproccess input data for (images, questions and answers).

$ python resize_images.py --input_dir='../COCO-2015/Images' --output_dir='../COCO-2015/Resized_Images'  
$ python make_vacabs_for_questions_answers.py --input_dir='../COCO-2015'
$ python build_vqa_inputs.py --input_dir='../COCO-2015' --output_dir='../COCO-2015'

4. Train model for VQA task.

$ cd ..
$ python train.py --model_name="<name to save logs>" --resume_epoch="<epoch number to resume from>" --saved_model="<saved model if resume training>"

5. Plotting.

Rename model_name variable in plotter.py

$ python plotter.py

6. Infer the trained model on an Image.

$ python test.py --saved_model="<path to model>" --image_path="<path to image>" --question="<ask question here>"

References

  • Paper implementation

    • Keywords: Visual Question Answering ; Simple Attention; Stacked Attention; Top-Down Attention;
  • Baseline Model

About

Visual Question Answering in PyTorch with various Attention Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published