This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
-
Updated
Sep 20, 2024 - Jupyter Notebook
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
SSG-VQA is a Visual Question Answering (VQA) dataset on laparoscopic videos providing diverse, geometrically grounded, unbiased and surgical action-oriented queries generated using scene graphs.
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
The Visual Question Answering (VQA) project features a model with a simple GUI that handles both images and videos. It uses OpenAI's CLIP for encoding images and questions and GPT-2 for decoding embeddings to answer questions based on the VQA Version 2 dataset, which includes 265,016 images with multiple questions and answers.
Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)
How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?
CLEVR3D Dataset: Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation
Visual Question Answering in the Medical Domain VQA-Med 2019
Part of our final year project work involving complex NLP tasks along with experimentation on various datasets and different LLMs
Counterfactual Reasoning VQA Dataset
B.Sc. Final Project: LXMERT Model Compression for Visual Question Answering.
The Easy Visual Question Answering dataset.
SciGraphQA: Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
Visual Question Answer (VQA) software! Powered by Flask, this project seamlessly combines images and questions to generate accurate responses. Explore the world of interactive visual understanding with ease.
Multi-page document understanding and VQA using OCR-free method
MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).
CloudCV Visual Question Answering Demo
Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.
VQA-Med 2021
Add a description, image, and links to the vqa-dataset topic page so that developers can more easily learn about it.
To associate your repository with the vqa-dataset topic, visit your repo's landing page and select "manage topics."