Retrieve-based metrics on VisDial, i.e., NDCG, MRR, Mean, R@{1,5,10}, is not a good performance indicator in generative setting. We aim to build a network model for generating accurate, high-quality responses for Visual Dialog agent. However, we found that high NDCG and generated response quality are in the opposite direction.
Please see all_generated_dialogs_2064_visdialv10val.txt, one example as follows:
image_id: 82004
caption: a blue sign is set up at gas station that reads `` rules of the road ''
-------
round_id:1
ques: can you read what the sign says? 【ground-truth question at round 1】
ground-truth: yes it is a list of rules【ground-truth answer】
gen_answ: yes 【generated by Our new model】
baseline_answ: no 【Compared method】
Visual Dialog needs an AI agent to chat with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, VisDial agent aims to answer the question in free-form natural language.
our new paper is coming soon...
- https://visualdialog.org/
- Abhishek Das, et al. Visual Dialog. CVPR 2017
- Abhishek Das, et al. Visual Dialog: Supplementary Document. CVPR 2017
- https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch
- ECCV2020-Large-scale Pretraining for Visual Dialog- A Simple State-of-the-Art Baseline. ECCV 2020
- Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs. ECCV 2020
- Modality-Balanced Models for Visual Dialogue. AAAI 2020
- DualVD-An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue. AAAI 2020
- DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog. AAAI 2020
- UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog. CVPR 2022
- Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning, ICASSP 2022
- VU-BERT: A Unified framework for Visual Dialog, ICASSP 2022
- Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog. ACM MM 2022
- Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog. TPAMI 2022
- Heterogeneous Knowledge Network for Visual Dialog. TCSVT 2022
- SKANET: Structured Knowledge-Aware Network for Visual Dialog. ICME 2021
- Multi-View Attention Network for Visual Dialog. Applied Sciences 2021