C-VQA: Counterfactual Reasoning VQA Dataset

This is the code and data for the paper What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.

Dataset

The dataset directory is C-VQA. You can find the questions in .csv files.

Download Images

After cloning:

pip install gdown
bash download_images.sh

Scripts

The scripts directory contains all required scripts for running models in the paper.

run_eval_cogvlm.py: CogVLM.
run_eval_lavis.py: InstructBLIP and BLIP (in LAVIS).
run_eval_minigpt4.py: MiniGPT-v2.
run_eval_llava.py: LLaVA.
run_eval_qwen.py: Qwen-VL.
run_eval_codellama.py: ViperGPT with CodeLlama.
run_eval_visprog.py: VisProg.
run_eval_wizard.py: ViperGPT with WizardCoder.

Before you run a script, install the corresponding model and get the weights. Then put the script in the root directory of the model.

Please change PATH_TO_IMAGES in the scripts to the actual directory of images.

Please change PATH_TO_MODEL in the scripts for ViperGPT with different code generators to the actual directory of models.

For example, to run BLIP on C-VQA, run this command in the root directory of LLaVA:

python run_eval_lavis.py --model-name blip2_t5 --model-type pretrain_flant5xxl --query PATH_TO_CSV_FILE

You can find more commands in scripts/README.

After you get the results, run format_response.py to convert raw responses to formatted responses (a single number or a single yes or no). Then run calc_acc.py to get quantitative results of the formatted responses. Remenber to fill in file names in these two scripts.

Download Code Generator Models

Change YOUR_HUGGINGFACE_TOKEN in download_model.py to your huggingface token. Then run:

pip install huggingface_hub
python download_model.py

You can add more code generators in download_model.py by adding models in repo_ids and local_dirs.

Citation

If this code is useful for your research, please consider citing our work.

@InProceedings{zhang2023cvqa,
    author    = {Zhang, Letian and Zhai, Xiaotong and Zhao, Zhongkai and Wen, Xin and Zhao, Bingchen},
    title     = {What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    year      = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C-VQA: Counterfactual Reasoning VQA Dataset

Dataset

Download Images

Scripts

Download Code Generator Models

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
C-VQA		C-VQA
scripts		scripts
README.md		README.md
download_images.sh		download_images.sh
download_model.py		download_model.py

Letian2003/C-VQA

Folders and files

Latest commit

History

Repository files navigation

C-VQA: Counterfactual Reasoning VQA Dataset

Dataset

Download Images

Scripts

Download Code Generator Models

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages