Skip to content

Latest commit

 

History

History
234 lines (112 loc) · 19.9 KB

File metadata and controls

234 lines (112 loc) · 19.9 KB

Awsome-Visual Question Answering

A list of resources for Visual Question Answering.

ICCV 2015

[1] Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering [paper] [project]

NIPS2015

[1] Ren M, Kiros R, Zemel R. Exploring models and data for image question answering [paper] [code]

CVPR2016

[1] Yang Z, He X, Gao J, et al. Stacked attention networks for image question answering [paper] [code]

[2] Andreas J, Rohrbach M, Darrell T, et al. Neural module networks [paper] [code]

NIPS2016

[1] Lu J, Yang J, Batra D, et al. Hierarchical question-image co-attention for visual question answering [paper] [code]

EMNLP2016

[1] Fukui A, Park D H, Yang D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding [paper] [code]

ECCV2016

[1] Jabri A, Joulin A, Van Der Maaten L. Revisiting visual question answering baselines [paper]

CVIU(compute Visual and Image Understanding)2016

[1] Wu Q, Teney D, Wang P, et al. Visual question answering: A survey of methods and datasets [paper]

ArXiv 2016

[1] Kim J H, On K W, Lim W, et al. Hadamard product for low-rank bilinear pooling [paper] [code]

CVPR 2017

[1] Goyal Y, Khot T, Summers-Stay D, et al. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering [paper]

[2] Johnson J, Hariharan B, van der Maaten L, et al. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning [paper]

[3] Ganju S, Russakovsky O, Gupta A. What's in a question: Using visual questions as a form of supervision [paper] [code]

[4] Nam H, Ha J W, Kim J. Dual attention networks for multimodal reasoning and matching [paper] [code]

LCPRIA2017(Iberian Conference on Pattern Recognition and Image Analysis)

[1] Bolaños M, Peris Á, Casacuberta F, et al. VIBIKNet: Visual bidirectional kernelized network for visual question answering [paper] [code]

ICCV 2017

[1] Ben-Younes H, Cadene R, Cord M, et al. Mutan: Multimodal tucker fusion for visual question answering [paper] [code]

[2] Zhu C, Zhao Y, Huang S, et al. Structured attentions for visual question answering [paper] [code]

[3] Hu R, Andreas J, Rohrbach M, et al. Learning to reason: End-to-end module networks for visual question answering [paper]

[4] Yu Z, Yu J, Fan J, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering [paper] [code]

[5] Zhu C, Zhao Y, Huang S, et al. Structured attentions for visual question answering [paper] [code]

[6] Gan C, Li Y, Li H, et al. Vqs: Linking segmentations to questions and answers for supervised attention in vqa and question-focused semantic segmentation [paper]

NIPS2017

[1] Schwartz I, Schwing A, Hazan T. High-order attention models for visual question answering [paper] [code]

[2] Ilievski I, Feng J. Multimodal learning and reasoning for visual question answering [paper]

EMNLP 2017

Mahendru A, Prabhu V, Mohapatra A, et al. The promise of premise: Harnessing question premises in visual question answering [paper] [code]

CVPR 2018

[1] Agrawal A, Batra D, Parikh D, et al. Don't just assume; look and answer: Overcoming priors for visual question answering [paper] [code]

[2] Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering [paper] [code]

[3] Teney D, Anderson P, He X, et al. Tips and tricks for visual question answering: Learnings from the 2017 challenge [paper]

[4] Gordon D, Kembhavi A, Rastegari M, et al. Iqa: Visual question answering in interactive environments [paper] [code]

[5] Nguyen D K, Okatani T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering [paper] [code]

[6] Liang J, Jiang L, Cao L, et al. Focal visual-text attention for visual question answering [paper] [code]

[7] Huk Park D, Anne Hendricks L, Akata Z, et al. Multimodal explanations: Justifying decisions and pointing to the evidence [paper]

[8] Gurari D, Li Q, Stangl A J, et al. Vizwiz grand challenge: Answering visual questions from blind people [paper]

[9] Mascharka D, Tran P, Soklaski R, et al. Transparency by design: Closing the gap between performance and interpretability in visual reasoning [paper]

[10] Cao Q, Liang X, Li B, et al. Visual question reasoning on general dependency tree [paper]

[11] Patro B, Namboodiri V P. Differential attention for visual question answering [paper]

[12] Su Z, Zhu C, Dong Y, et al. Learning visual knowledge memory networks for visual question answering [paper]

[13] Fan H, Zhou J. Stacked latent attention for multimodal reasoning [paper]

[14] Hu H, Chao W L, Sha F. Learning answer embeddings for visual question answering [paper]

ICLR2018

[1] Zhang Y, Hare J, Prügel-Bennett A. Learning to count objects in natural images for visual question answering [paper] [code]

AAAI 2018

[1] Lu P, Li H, Zhang W, et al. Co-attending free-form regions and detections with multi-modal multiplicative feature embedding for visual question answering [paper] [code]

ACL 2018

[1] Mudrakarta P K, Taly A, Sundararajan M, et al. Did the model understand the question? [paper]

Transactions on neural networks and learning systems2018

[1] Yu Z, Yu J, Xiang C, et al. Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering [paper] [code]

ECCV 2018

[1] Bai Y, Fu J, Zhao T, et al. Deep attention neural tensor network for visual question answering [paper]

[2] Yang G R, Ganichev I, Wang X J, et al. A dataset and architecture for visual reasoning with a working memory [paper]

[3] Li Q, Tao Q, Joty S, et al. Vqa-e: Explaining, elaborating, and enhancing your answers for visual questions [paper]

[4] Shi Y, Furlanello T, Zha S, et al. Question type guided attention in visual question answering [paper]

[5] Malinowski M, Doersch C, Santoro A, et al. Learning visual question answering by bootstrapping hard attention [paper]

[6] Yu Y, Kim J, Kim G. A joint sequence fusion model for video question answering and retrieval [paper]

[7] Gao P, Li H, Li S, et al. Question-guided hybrid convolution for visual question answering [paper]

[8] Narasimhan M, Schwing A G. Straight to the facts: Learning knowledge base retrieval for factual visual question answering [paper]

[9] Li W, Yuan Z, Fang X, et al. Knowing Where to Look? Analysis on Attention of Visual Question Answering System [paper]

NIPS2018

[1] Kim J H, Jun J, Zhang B T. Bilinear attention networks [paper] [code]

[2] Norcliffe-Brown W, Vafeias S, Parisot S. Learning conditioned graph structures for interpretable visual question answering [paper] [code]

[3] Deng Y, Kim Y, Chiu J, et al. Latent alignment and variational attention [paper] [code]

[4] Yi K, Wu J, Gan C, et al. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding [paper] [code]

[5] Narasimhan M, Lazebnik S, Schwing A. Out of the box: Reasoning with graph convolution nets for factual visual question answering [paper]

[6] Wu C, Liu J, Wang X, et al. Chain of reasoning for visual question answering [paper]

ArXiv 2018

[1] Jiang Y, Natarajan V, Chen X, et al. Pythia v0. 1: the winning entry to the vqa challenge 2018 [paper] [code]

CVPR 2019

[1] Cadene R, Ben-younes H, Cord M, et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering [paper] [code]

[2] Peng G, Li H, You H, et al. Dynamic Fusion with Intra-and Inter-Modality Attention Flow for Visual Question Answering [paper] [code]

[3] Shah M, Chen X, Rohrbach M, et al. Cycle-Consistency for Robust Visual Question Answering [paper]

[4] Marino K, Rastegari M, Farhadi A, et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge [paper]

[5] Li H, Wang P, Shen C, et al. Visual Question Answering as Reading Comprehension[paper]

[6] Kim J, Ma M, Kim K, et al. Progressive Attention Memory Network for Movie Story Question Answering[paper]

[7] Manjunatha V, Saini N, Davis L S. Explicit Bias Discovery in Visual Question Answering Models[paper]

[8] Shrestha R, Kafle K, Kanan C. Answer them all! toward universal visual question answering models[paper]

[9] Singh A, Natarajan V, Shah M, et al. Towards vqa models that can read[paper]

[10] Fan C, Zhang X, Zhang S, et al. Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering[paper]

[11] Fukui H, Hirakawa T, Yamashita T, et al. Attention branch network: Learning of attention mechanism for visual explanation [paper]

[12] Xiong P, Zhan H, Wang X, et al. Visual Query Answering by Entity-Attribute Graph Matching and Reasoning [paper]

[13] Noh H, Kim T, Mun J, et al. Transfer Learning via Unsupervised Task Discovery for Visual Question Answering [paper] [code]

[14] Tang K, Zhang H, Wu B, et al. Learning to compose dynamic tree structures for visual contexts [paper]

[15] Yu Z, Yu J, Cui Y, et al. Deep Modular Co-Attention Networks for Visual Question Answering [paper] [code]

[16] Shi J, Zhang H, Li J. Explainable and explicit visual reasoning over scene graphs [paper] [code]

ICLR2019

[1] Zhang Y, Hare J, Prügel-Bennett A. Learning Representations of Sets through Optimized Permutations [paper] [code]

TPAMI 2019

[1] Liang J, Jiang L, Cao L, et al. Focal visual-text attention for memex question answering [paper] [code]

AAAI 2019

[1] Ben-Younes H, Cadene R, Thome N, et al. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection [paper] [code]

ArXiv2019

[1] Wu J, Mooney R J. Self-Critical Reasoning for Robust Visual Question Answering [paper] [code]

[2] Cadene R, Dancette C, Ben-younes H, et al. RUBi: Reducing Unimodal Biases in Visual Question Answering [paper] [code]

[3] Li L, Gan Z, Cheng Y, et al. Relation-aware Graph Attention Network for Visual Question Answering [paper]

[4] Wu Y, Sun Q, Ma J, et al. Question Guided Modular Routing Networks for Visual Question Answering [paper]

Based On GQA dataset

CVPR2019

[1] Hudson D A, Manning C D. Gqa: A new dataset for real-world visual reasoning and compositional question answering [paper] [code]