Skip to content

KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autonomous-driving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving

Welcome to the GitHub repository for the paper titled "Towards a Performance Analysis on Pre-trained Visual Question Answering Models for Autonomous Driving."

This repository provides supplementary materials related to the paper, including resources for researchers and practitioners interested in the evaluation of Visual Question Answering (VQA) models in autonomous driving contexts.

Repository Contents

The repository includes the following key components:

  1. Comprehensive Corpus of Research Papers
    A collection of 78 research papers focused on Visual Question Answering (VQA) models, curated to support those studying the application of VQA models in various domains, especially autonomous driving.

  2. Test Dataset
    The dataset used for evaluating three pre-trained VQA models—ViLBERT, ViLT, and LXMERT—on questions related to autonomous driving. This dataset can be used for benchmarking these models or other VQA models.

  3. Evaluation and Analysis
    A detailed analysis of the results obtained from the evaluation of ViLBERT, ViLT, and LXMERT in the context of autonomous driving. This analysis provides insights into how well these models perform when tasked with answering questions related to driving scenarios, compared to reference answers from domain experts.

Abstract

This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context of answering questions relating to driving scenarios. The performance of these models is evaluated by comparing the similarity of their responses to reference answers provided by computer vision experts. Model selection is based on the analysis of transformer utilization in multimodal architectures. The results indicate that models incorporating cross-modal attention and late fusion techniques exhibit promising potential for generating improved answers within a driving perspective.

This initial analysis serves as a launchpad for a forthcoming comprehensive comparative study involving nine VQA models and sets the scene for further investigations into the effectiveness of VQA model queries in self-driving scenarios.

How to Use This Repository

  1. Corpus of Research Papers
    The collection of papers is included to assist with understanding the landscape of VQA research. These papers may serve as background material for further studies or benchmarking efforts.

  2. Test Dataset
    The dataset provided can be downloaded and used for testing or replicating the results of the three VQA models discussed in the paper (ViLBERT, ViLT, and LXMERT). Researchers can also use this dataset to evaluate other VQA models.

  3. Analysis and Results
    The analysis provided in the repository can be used to understand the performance of these models and gain insights into how cross-modal attention mechanisms and late fusion architectures impact the effectiveness of VQA models in answering driving-related questions.

How to Cite

If you find this work helpful and wish to cite the paper, please use one of the following formats:

arXiv Citation

Rekanar, Kaavya, Ciarán Eising, Ganesh Sistu, and Martin Hayes. 
"Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving." 
arXiv e-prints (2023): arXiv-2307.

IMVIP 2023 Citation

The paper is also available in the official proceedings of IMVIP 2023, where it was first presented:

Rekanar, Kaavya, Ciarán Eising, Ganesh Sistu, and Martin Hayes. 
"Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving." 
Proceedings of the Irish Machine Vision and Image Processing Conference (IMVIP 2023), pp. 346-349.

The full proceedings can be accessed here.

Poster Presentation

The poster presented at IMVIP 2023 can be found in 'poster.pdf'. It provides a visual summary of the research and key findings from the paper.

Future Work

This repository serves as a foundation for a larger-scale comparative study involving nine VQA models, which will be included in future updates. The work presented here is part of ongoing research into the use of VQA models in autonomous driving, with the goal of improving their ability to respond accurately to domain-specific queries.

Contact

For questions or feedback regarding the paper or the repository, please feel free to contact the authors.


We hope this repository proves useful for your research or practical applications!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published