PyTorch implementation for TIP2023 paper of “Plug-and-Play Regulators for Image-Text Matching”.
It is built on top of the SGRAF, GPO and Awesome_Matching.
If any problems, please contact me at r1228240468@gmail.com. (diaohw@mail.dlut.edu.cn is deprecated)
The framework of RCAR:
The reported results (One can import GloVe Embedding or BERT for better results)
Dataset | Module | Sentence retrieval | Image retrieval | ||||
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | ||
Flick30k | T2I | 79.7 | 95.0 | 97.4 | 60.9 | 84.4 | 90.1 |
I2T | 76.9 | 95.5 | 98.0 | 58.8 | 83.9 | 89.3 | |
ALL | 82.3 | 96.0 | 98.4 | 62.6 | 85.8 | 91.1 | |
MSCOCO1k | T2I | 79.1 | 96.5 | 98.8 | 63.9 | 90.7 | 95.9 |
I2T | 79.3 | 96.5 | 98.8 | 63.8 | 90.4 | 95.8 | |
ALL | 80.9 | 96.9 | 98.9 | 65.7 | 91.4 | 96.4 | |
MSCOCO5k | T2I | 59.1 | 84.8 | 91.8 | 42.8 | 71.5 | 81.9 |
I2T | 58.4 | 84.6 | 91.9 | 41.7 | 71.4 | 81.7 | |
ALL | 61.3 | 86.1 | 92.6 | 44.3 | 73.2 | 83.2 |
Utilize pip install -r requirements.txt
for the following dependencies.
- Python 3.7.11
- PyTorch 1.7.1
- NumPy 1.21.5
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
https://www.kaggle.com/datasets/kuanghueilee/scan-features
Another download link is available below:
https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
data
├── coco
│ ├── precomp # pre-computed BUTD region features for COCO, provided by SCAN
│ │ ├── train_ids.txt
│ │ ├── train_caps.txt
│ │ ├── ......
│ │
│ └── id_mapping.json # mapping from coco-id to image's file name
│
│
├── f30k
│ ├── precomp # pre-computed BUTD region features for Flickr30K, provided by SCAN
│ │ ├── train_ids.txt
│ │ ├── train_caps.txt
│ │ ├── ......
│ │
│ └── id_mapping.json # mapping from f30k index to image's file name
│
│
└── vocab # vocab files provided by SCAN (only used when the text backbone is BiGRU)
Modify the model_path, split, fold5 in the eval.py
file.
Note that fold5=True
is only for evaluation on mscoco1K (5 folders average) while fold5=False
for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_RCAR and MSCOCO_RCAR.
Then run python eval.py
in the terminal.
Uncomment the required parts of BASELINE, RAR, RCR, RCAR in the train_xxxx_xxx.sh
file.
Then run ./train_xxx_xxx.sh
in the terminal:
If RCAR is useful for your research, please cite the following paper:
@article{Diao2023RCAR,
author={Diao, Haiwen and Zhang, Ying and Liu, Wei and Ruan, Xiang and Lu, Huchuan},
journal={IEEE Transactions on Image Processing},
title={Plug-and-Play Regulators for Image-Text Matching},
year={2023},
volume={32},
pages={2322-2334}
}