CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
-
Updated
Jan 19, 2020 - Python
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow
Pytorch implementation of VQA using Stacked Attention Networks: Multimodal architecture for image and question input, using CNN and LSTM, with stacked attention layer for improved accuracy (54.82%). Includes visualization of attention layers. Contributions welcome. Utilizes Visual VQA v2.0 dataset.
Final project of the Deep Learning course.
Add a description, image, and links to the stacked-attention-networks topic page so that developers can more easily learn about it.
To associate your repository with the stacked-attention-networks topic, visit your repo's landing page and select "manage topics."