- Authors: Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
- Paper: "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models"
- Project Page
We propose ECoFLaP, a two-stage coarse-to-fine weight pruning approach for Large Vision-Language Models (LVLMs). We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients. Then, the multimodal model performs local layer-wise unstructured weight pruning based on the given ratios.
We validate our proposed method across various multimodal and unimodal models and datasets, demonstrating significant performance improvements over prevalent pruning techniques in the high-sparsity regime.
- [Feb 2024] Checkpoints are added
Sparsities are all 0.5
Wanda | ECoFLaP first-order | ECoFLaP zeroth-order |
---|---|---|
Ckpt | Ckpt | Ckpt |
Wanda | ECoFLaP first-order | ECoFLaP zeroth-order |
---|---|---|
Ckpt | Ckpt | Ckpt |
Wanda | ECoFLaP first-order | ECoFLaP zeroth-order |
---|---|---|
Ckpt | Ckpt | Ckpt |
Sparsities are all 0.4
Wanda | SparseGPT | ECoFLaP w/ Wanda | ECoFLaP w/ SparseGPT |
---|---|---|---|
Ckpt | Ckpt | Ckpt | Ckpt |
Sparsities are all 0.5
Dataset | Wanda | ECoFLaP | ECoFLaP w/ fine-tuning |
---|---|---|---|
VQA | Ckpt | Ckpt | Ckpt |
NLVR2 | Ckpt | Ckpt | Ckpt |
Flickr | Ckpt | Ckpt | Ckpt |
COCO Caption | Ckpt | Ckpt | Ckpt |
- Some additional results regarding BLIP models
Methods | VQA (test dev) | Flickr30k (TR@1/IR@1) | NLVR2 (val/text) | COCO Cap. (CIDEr/SPICE) |
---|---|---|---|---|
Full model | 77.4 | 96.8/86.9 | 82.3/83.6 | 133.3/23.8 |
Wanda (w/o fine-tuning) | 71.9 | 85.3/72.3 | 78.3/78.1 | 97.1/18.4 |
ECoFLaP (w/o fine-tuning) | 73.6 | 90.2/79.5 | 79.1/79.2 | 111.0/20.3 |
UPop (w/ fine-tuning) | 76.3 | 94.0/82.0 | 80.3/81.1 | 128.9/23.3 |
ECoFLaP (w/ fine-tuning) | 76.7 | 96.8/85.6 | 81.8/82.5 | 132.3/23.8 |
The main code for this part is in LAVIS/
. Please do everything in LAVIS/ by cd LAVIS/
.
pip install -e .
Follow the scripts in lavis/datasets/download_scripts/
to download the datasets.
## BLIP-2 experiments
# ECoFLaP - zeroth order
python scripts/blip2/ecoflap_zeroth.py 0 12341
# ECoFLaP - first order
python scripts/blip2/ecoflap_first.py 0 12341
# Wanda
python scripts/blip2/wanda.py 0 12341
# SparseGPT
python scripts/blip2/sparsegpt.py 0 12341
# ECoFLaP - zeroth order
python scripts/eva_clip/ecoflap.py 0 12341
# Wanda
python scripts/eva_clip/wanda.py 0 12341
### Generate the pruned checkpoint
# ECoFLaP - zeroth order
python scripts/t5/ecoflap.py 0 12341
### Do the five-shot evaluation
# go to the mmlu_eval folder
cd ../mmlu_eval
# Make sure to assign pruned_checkpoint to the checkpoint generated in the previous step
bash test.sh
The main code for this part is in CoOp/
. Please do everything in CoOp/ by cd CoOp/
.
pip install -r requirements.txt
Follow the scripts in DATASETS.md
to download the datasets.
# Wanda and ECoFLaP (w/ Wanda)
bash scripts/coop/ecoflap_wanda.sh
# SparseGPT and ECoFLaP (w/ SparseGPT)
bash scripts/coop/ecoflap_sparsegpt.sh
The main code for this part is in UPop/
. Please do everything in UPop/ by cd UPop/
.
pip install -r requirements.txt
Follow the scripts in README.md
to download the datasets.
### task=coco, flickr, nlvr2, vqa
# Wanda
bash ecoflap_scripts/${task}/wanda.sh
# ECoFLaP
bash ecoflap_scripts/${task}/ecoflap.sh
# Fine-tune the pruned checkpoint obtained by ECoFLaP
bash ecoflap_scripts/${task}/ecoflap_finetuning.sh
The main code for this part is in LLaMA/
. Please do everything in LLaMA/ by cd LLaMA/
.
Follow the scripts in Install.md
.
I removed --cache_dir
so the program will read the cache that store in $HF_HOME$
(if specified) or default cache directory.
# ECoFLaP
bash scripts/ecoflap_zero.sh 0
@inproceedings{Sung2024ECoFLaP,
author = {Yi-Lin Sung, Jaehong Yoon, Mohit Bansal},
title = {ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024},
}