Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving

Welcome to the GitHub repository for the paper titled "Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving."

This repository provides supplementary materials related to the paper, including resources for researchers and practitioners interested in the evaluation of Visual Question Answering (VQA) models in autonomous driving contexts.

Repository Contents

The repository includes the following key components:

Comprehensive Corpus of Research Papers
A collection of 78 research papers focused on Visual Question Answering (VQA) models, curated to support those studying the application of VQA models in various domains, especially autonomous driving.
Test Dataset
The dataset used for evaluating three pre-trained VQA models—ViLBERT, ViLT, and LXMERT—on questions related to autonomous driving. This dataset can be used for benchmarking these models or other VQA models.
Evaluation and Analysis
A detailed analysis of the results obtained from the evaluation of ViLBERT, ViLT, and LXMERT in the context of autonomous driving. This analysis provides insights into how well these models perform when tasked with answering questions related to driving scenarios, compared to reference answers from domain experts.

Abstract

This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context of answering questions relating to driving scenarios. The performance of these models is evaluated by comparing the similarity of their responses to reference answers provided by computer vision experts. Model selection is based on the analysis of transformer utilization in multimodal architectures. The results indicate that models incorporating cross-modal attention and late fusion techniques exhibit promising potential for generating improved answers within a driving perspective.

This initial analysis serves as a launchpad for a forthcoming comprehensive comparative study involving nine VQA models and sets the scene for further investigations into the effectiveness of VQA model queries in self-driving scenarios.

How to Use This Repository

Corpus of Research Papers
The collection of papers is included to assist with understanding the landscape of VQA research. These papers may serve as background material for further studies or benchmarking efforts.
Test Dataset
The dataset provided can be downloaded and used for testing or replicating the results of the three VQA models discussed in the paper (ViLBERT, ViLT, and LXMERT). Researchers can also use this dataset to evaluate other VQA models.
Analysis and Results
The analysis provided in the repository can be used to understand the performance of these models and gain insights into how cross-modal attention mechanisms and late fusion architectures impact the effectiveness of VQA models in answering driving-related questions.

How to Cite

If you find this work helpful and wish to cite the paper, please use one of the following formats:

arXiv Citation

Rekanar, Kaavya, Ciarán Eising, Ganesh Sistu, and Martin Hayes. 
"Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving." 
arXiv e-prints (2023): arXiv-2307.

IMVIP 2023 Citation

The paper is also available in the official proceedings of IMVIP 2023, where it was first presented:

Rekanar, Kaavya, Ciarán Eising, Ganesh Sistu, and Martin Hayes. 
"Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving." 
Proceedings of the Irish Machine Vision and Image Processing Conference (IMVIP 2023), pp. 346-349.

The full proceedings can be accessed here.

Poster Presentation

The poster presented at IMVIP 2023 can be found in 'poster.pdf'. It provides a visual summary of the research and key findings from the paper.

Future Work

This repository serves as a foundation for a larger-scale comparative study involving nine VQA models, which will be included in future updates. The work presented here is part of ongoing research into the use of VQA models in autonomous driving, with the goal of improving their ability to respond accurately to domain-specific queries.

Contact

For questions or feedback regarding the paper or the repository, please feel free to contact the authors.

We hope this repository proves useful for your research or practical applications!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Analysis_survey.pdf		Analysis_survey.pdf
README.md		README.md
VQA Papers.xlsx		VQA Papers.xlsx
poster.pdf		poster.pdf
test dataset.zip		test dataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving

Repository Contents

Abstract

How to Use This Repository

How to Cite

arXiv Citation

IMVIP 2023 Citation

Poster Presentation

Future Work

Contact

About

Releases

Packages

KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autonomous-driving

Folders and files

Latest commit

History

Repository files navigation

Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving

Repository Contents

Abstract

How to Use This Repository

How to Cite

arXiv Citation

IMVIP 2023 Citation

Poster Presentation

Future Work

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages