This is the code and data for the paper What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.
The dataset directory is C-VQA
. You can find the questions in .csv
files.
After cloning:
pip install gdown
bash download_images.sh
The scripts
directory contains all required scripts for running models in the paper.
-
run_eval_cogvlm.py
: CogVLM. -
run_eval_lavis.py
: InstructBLIP and BLIP (in LAVIS). -
run_eval_minigpt4.py
: MiniGPT-v2. -
run_eval_llava.py
: LLaVA. -
run_eval_qwen.py
: Qwen-VL. -
run_eval_visprog.py
: VisProg. -
run_eval_wizard.py
: ViperGPT with WizardCoder.
Before you run a script, install the corresponding model and get the weights. Then put the script in the root directory of the model.
Please change PATH_TO_IMAGES
in the scripts to the actual directory of images.
Please change PATH_TO_MODEL
in the scripts for ViperGPT with different code generators to the actual directory of models.
For example, to run BLIP on C-VQA, run this command in the root directory of LLaVA:
python run_eval_lavis.py --model-name blip2_t5 --model-type pretrain_flant5xxl --query PATH_TO_CSV_FILE
You can find more commands in scripts/README.
After you get the results, run format_response.py
to convert raw responses to formatted responses (a single number or a single yes
or no
). Then run calc_acc.py
to get quantitative results of the formatted responses. Remenber to fill in file names in these two scripts.
Change YOUR_HUGGINGFACE_TOKEN in download_model.py
to your huggingface token. Then run:
pip install huggingface_hub
python download_model.py
You can add more code generators in download_model.py
by adding models in repo_ids and local_dirs.
If this code is useful for your research, please consider citing our work.
@InProceedings{zhang2023cvqa,
author = {Zhang, Letian and Zhai, Xiaotong and Zhao, Zhongkai and Wen, Xin and Zhao, Bingchen},
title = {What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
year = {2023}
}