Skip to content

Latest commit

 

History

History
43 lines (37 loc) · 2.67 KB

README.md

File metadata and controls

43 lines (37 loc) · 2.67 KB

A novel framework for Visual Dialog (VisDial) 2022.11

Retrieve-based metrics on VisDial, i.e., NDCG, MRR, Mean, R@{1,5,10}, is not a good performance indicator in generative setting. We aim to build a network model for generating accurate, high-quality responses for Visual Dialog agent. However, we found that high NDCG and generated response quality are in the opposite direction.

Comparsion of generated examples on VisDial v1.0 val split (2064 generated dialogs)

Please see all_generated_dialogs_2064_visdialv10val.txt, one example as follows:

image_id: 82004
caption: a blue sign is set up at gas station that reads `` rules of the road ''
-------
round_id:1
ques: can you read what the sign says? 【ground-truth question at round 1】
ground-truth: yes it is a list of rules【ground-truth answer】
gen_answ: yes 【generated by Our new model】
baseline_answ: no 【Compared method】


Visual Dialog (VisDial) task

Visual Dialog needs an AI agent to chat with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, VisDial agent aims to answer the question in free-form natural language.

Paper

our new paper is coming soon...

References

  1. https://visualdialog.org/
  2. Abhishek Das, et al. Visual Dialog. CVPR 2017
  3. Abhishek Das, et al. Visual Dialog: Supplementary Document. CVPR 2017
  4. https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch
  5. ECCV2020-Large-scale Pretraining for Visual Dialog- A Simple State-of-the-Art Baseline. ECCV 2020
  6. Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs. ECCV 2020
  7. Modality-Balanced Models for Visual Dialogue. AAAI 2020
  8. DualVD-An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue. AAAI 2020
  9. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog. AAAI 2020
  10. UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog. CVPR 2022
  11. Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning, ICASSP 2022
  12. VU-BERT: A Unified framework for Visual Dialog, ICASSP 2022
  13. Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog. ACM MM 2022
  14. Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog. TPAMI 2022
  15. Heterogeneous Knowledge Network for Visual Dialog. TCSVT 2022
  16. SKANET: Structured Knowledge-Aware Network for Visual Dialog. ICME 2021
  17. Multi-View Attention Network for Visual Dialog. Applied Sciences 2021