[toc]
This repository provides the code for our paper at AAAI 2022 (Oral):
Contrastive Quantization with Code Memory for Unsupervised Image Retrieval. Jinpeng Wang, Ziyun Zeng, Bin Chen, Tao Dai, Shu-Tao Xia. [arXiv].
We proposed MeCoQ, an unsupervised deep quantization method for image retrieval. Different from reconstruction-based methods that learn to preserve pairwise similarity information in continuous embeddings, MeCoQ learns quantized representation via contrastive learning. To boost contrastive learning, MeCoQ leverages a quantization code memory during training. Experiments on CIFAR-10 (under two evaluation protocols), Flickr-25K, and NUS-WIDE datasets demonstrate the effectiveness of MeCoQ.
In the following, we will guide you how to use this repository step by step. 🤗
git clone https://github.com/gimpong/AAAI22-MeCoQ.git
cd AAAI22-MeCoQ/
- python 3.7.9
- numpy 1.19.1
- pandas 1.0.5
- pytorch 1.3.1
- torchvision 0.4.2
- pillow 8.0.0
- python-opencv 3.4.2
- tqdm 4.51.0
Before running the code, we need to make sure that everything needed is ready. First, the working directory is expected to be organized as below:
AAAI22-MeCoQ/
- data/
- Flickr25k/
- img.txt
- targets.txt
- Nuswide/
- database.txt
- test.txt
- train.txt
- datasets/
- CIFAR-10/
- cifar-10-batches-py/
- batches.meta
- data_batch_1
- ...
- Flickr25K/
- mirflickr/
- im1.jpg
- im2.jpg
- ...
- NUS-WIDE/
- Flickr/
- actor/
- 0001_2124494179.jpg
- 0002_174174086.jpg
- ...
- administrative_assistant/
- ...
- ...
- scripts/
- run0001.sh
- run0002.sh
- ...
- main.py
- engine.py
- data.py
- utils.py
- loss.py
-
The
data/
folder is the collection of data splits for Flickr25K and NUS-WIDE datasets. The raw images of Flickr25K and NUS-WIDE datasets should be downloaded additionally and arranged indatasets/Flickr25K/
anddatasets/NUS-WIDE/
respectively. Here we provide copies of these image datasets, you can download them via Google Drive or Baidu Wangpan (Web Drive, password: n307). -
For experiments on CIFAR-10 dataset, you can use the option
--download_cifar10
when runningmain.py
.
To facilitate reproducibility, we provide the scripts with configurations for each experiment. These scripts can be found under the scripts/ folder. For example, if you want to train and evaluate a 16-bit MeCoQ model on Flickr25K dataset, you can do
cd scripts/
# '0' is the id of GPU
bash run0001.sh 0
The script run0001.sh
includes the running commands:
#!/bin/bash
cd ..
python main.py \
--notes Flickr16bits \
--device cuda:$1 \
--dataset Flickr25K \
--trainable_layer_num 0 \
--M 2 \
--feat_dim 32 \
--T 0.4 \
--hp_beta 1e-1 \
--hp_lambda 0.5 \
--mode debias --pos_prior 0.15 \
--queue_begin_epoch 5 \
--topK 5000
cd -
After running a script, a series of folders and files will be saved under logs/
and checkpoints/
, whose file identifiers are consistent with the argument --notes
in run0001.sh
(e.g., Flickr16bits
).
Under logs/
, there will be a log file (e.g., Flickr16bits.log
) and a folder of tensorboard files (e.g., Flickr16bits
).
Under checkpoints/
, there will be a folder (e.g., Flickr16bits/
) of information for the final checkpoint, including quantization codes (db_codes.npy
) and labels (db_targets.npy
) for the database set, model checkpoint (model.cpt
), performance records (P_at_topK_curve.txt
and PR_curve.txt
).
⚠️ Warning: the difficulty in reproducing exactly the same results on different software and hardware architectures 🤔
Initially, we tuned different experiments (e.g., different datasets and different quantization code lengths) separately on different servers in the authors' lab. These servers are equipped with 3 kinds of GPUs: NVIDIA® GeForce® GTX 1080 Ti (11GB), NVIDIA® GeForce® GTX 2080 Ti (11GB) and NVIDIA® Tesla® V100 (32 GB).
During our preparation for code releasing, we accidentally found that even with the same code and same hyper-parameter configuration (including the fixed random seeds), executing experiments on different servers can still yield different results. Such results may be influenced by various factors, e.g., the versions of drivers, libraries, and hardware architectures.
Unfortunately, we were not aware of this phenomenon during our paper submission and the reported results were based on mixed architectures. 😩
Here we report the results of running the scripts on three kinds of GPUs in the following table. We have also uploaded the logs and checkpoint information for reference, which can be downloaded from Baidu Wangpan (Web Drive), password: ncw0.
If you find this code useful or use the toolkit in your work, please consider citing:
@inproceedings{wang22mecoq,
author={Wang, Jinpeng and Zeng, Ziyun and Chen, Bin and Dai, Tao and Xia, Shu-Tao},
title={Contrastive Quantization with Code Memory for Unsupervised Image Retrieval},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2022}
}
Our code is based on the implementation of PyTorch SimCLR, MoCo, DCL, Deep-Unsupervised-Image-Hashing and CIBHash.
If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.