Chau Pham, Truong Vu, Khoi Nguyen
VinAI Research, Vietnam
Abstract: This paper addresses the challenging problem of open-vocabulary object detection (OVOD) where an object detector must identify both seen and unseen classes in test images without labeled examples of the unseen classes in training. A typical approach for OVOD is to use joint text-image embeddings of CLIP to assign box proposals to their closest text label. However, this method has a critical issue: many low-quality boxes, such as over- and under-covered-object boxes, have the same similarity score as high-quality boxes since CLIP is not trained on exact object location information. To address this issue, we propose a novel method, LP-OVOD, that discards low-quality boxes by training a sigmoid linear classifier on pseudo labels retrieved from the top relevant region proposals to the novel text. Experimental results on COCO affirm the superior performance of our approach over the state of the art, achieving 40.5 in
$AP_{novel}$ using ResNet50 as the backbone and without external datasets or knowing novel classes during training.
Details of the model architecture and experimental results can be found in our following paper.
Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.
@inproceedings{pham2024lp,
title={LP-OVOD: Open-Vocabulary Object Detection by Linear Probing},
author={Pham, Chau and Vu, Truong and Nguyen, Khoi},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={779--788},
year={2024}
}
- python3.8
- pytorch 1.7.0
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements/build.txt
pip install -e .
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/lvis-dataset/lvis-api.git
pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install yapf==0.40.2
conda install -c pytorch faiss-gpu
Download the following dataset COCO, and open-vocabulary COCO split from this link.
All models use the backbone pretrained with SoCo. Download the pretrained backbone and save to the folder weights
. Also save the pretrained CLIP model to weights
.
Download COCO proposals from this link and put under the folder proposals
├── configs
├── mmdet
├── weights
│ ├── current_mmdetection_Head.pth
│ ├── ViT-B-32.pt
├── ovd_coco_text_embedding.pth
├── tools
├── prepare
├── proposals
│ ├── train_coco_id_map.json
│ ├── train_coco_proposals.pkl
│ ├── val_coco_id_map.json
│ ├── val_coco_proposals.pkl
├── retrieval
├── scripts
├── ovd_coco_text_embedding.pth
├── data
│ ├── coco
│ │ ├── annotations
│ │ | ├── ovd_ins_{train,val}2017_{all,b,t}.json
│ │ | ├── instances_{train,val}2017.json
│ │ ├── train2017
│ │ ├── val2017
python ./prepare/clip_utils.py
A file ovd_coco_text_embedding.pth
will be created (we have already extracted this for you).
This embeddings will be used for computing the Knowledge Distillation loss and retrieving novel proposals
python -m torch.distributed.launch --nproc_per_node=4 prepare/extract_coco_embeddings_clip.py \
--data_root=path_to_data_root \
--clip_root=weights \
--proposal_file=path_to_oln_proposals \
--num_worker=48 \
--batch_size=128 \
--split=train \
--save_path=coco_clip_emb_train.pth \
Change num_workers
and batch_size
according to your machine.
A file coco_clip_emb_train.pth
(which is over 100GB) will be created, so please check for enough disk space before extracting.
bash ./scripts/vild_sigmoid.sh
We provide the pretraining checkpoint at this link
bash ./scripts/vild_sigmoid_ft.sh /path/to/pretraining_ckpt
bash ./scripts/vild_sigmoid_test.sh /path/to/ft_ckpt
You should change the checkpoint in each script accordingly to the path in your machine.
Novel AP | Base AP | Overall AP | download |
---|---|---|---|
40.5 | 60.5 | 55.2 | model |
If you have any questions about this project, contact via truongvu0911nd@gmail.com or open an issue in this repository