PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)

Chau Pham, Truong Vu, Khoi Nguyen
VinAI Research, Vietnam

Abstract: This paper addresses the challenging problem of open-vocabulary object detection (OVOD) where an object detector must identify both seen and unseen classes in test images without labeled examples of the unseen classes in training. A typical approach for OVOD is to use joint text-image embeddings of CLIP to assign box proposals to their closest text label. However, this method has a critical issue: many low-quality boxes, such as over- and under-covered-object boxes, have the same similarity score as high-quality boxes since CLIP is not trained on exact object location information. To address this issue, we propose a novel method, LP-OVOD, that discards low-quality boxes by training a sigmoid linear classifier on pseudo labels retrieved from the top relevant region proposals to the novel text. Experimental results on COCO affirm the superior performance of our approach over the state of the art, achieving 40.5 in $AP_{novel}$ using ResNet50 as the backbone and without external datasets or knowing novel classes during training.

Details of the model architecture and experimental results can be found in our following paper.
Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.

@inproceedings{pham2024lp,
  title={LP-OVOD: Open-Vocabulary Object Detection by Linear Probing},
  author={Pham, Chau and Vu, Truong and Nguyen, Khoi},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={779--788},
  year={2024}
}

Requirements

python3.8
pytorch 1.7.0

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements/build.txt
pip install -e .
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/lvis-dataset/lvis-api.git
pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install yapf==0.40.2
conda install -c pytorch faiss-gpu

Preparation

Data

Download the following dataset COCO, and open-vocabulary COCO split from this link.

All models use the backbone pretrained with SoCo. Download the pretrained backbone and save to the folder weights. Also save the pretrained CLIP model to weights.

Download COCO proposals

Download COCO proposals from this link and put under the folder proposals

Code structure

├── configs
├── mmdet
├── weights
│   ├── current_mmdetection_Head.pth
│   ├── ViT-B-32.pt
├── ovd_coco_text_embedding.pth
├── tools
├── prepare
├── proposals
│   ├── train_coco_id_map.json
│   ├── train_coco_proposals.pkl
│   ├── val_coco_id_map.json
│   ├── val_coco_proposals.pkl
├── retrieval
├── scripts
├── ovd_coco_text_embedding.pth
├── data
│   ├── coco
│   │   ├── annotations
│   │   |   ├── ovd_ins_{train,val}2017_{all,b,t}.json
│   │   |   ├── instances_{train,val}2017.json
│   │   ├── train2017
│   │   ├── val2017

Extract the CLIP text embeddings for COCO classes (Optional)

python ./prepare/clip_utils.py

A file ovd_coco_text_embedding.pth will be created (we have already extracted this for you).

Extract the CLIP visual embeddings on pre-computed proposals

This embeddings will be used for computing the Knowledge Distillation loss and retrieving novel proposals

python -m torch.distributed.launch --nproc_per_node=4 prepare/extract_coco_embeddings_clip.py \
    --data_root=path_to_data_root \
    --clip_root=weights \
    --proposal_file=path_to_oln_proposals \
    --num_worker=48 \
    --batch_size=128 \
    --split=train \
    --save_path=coco_clip_emb_train.pth \

Change num_workers and batch_size according to your machine. A file coco_clip_emb_train.pth (which is over 100GB) will be created, so please check for enough disk space before extracting.

Training and Testing

Pretraining for Base Classes

bash ./scripts/vild_sigmoid.sh

We provide the pretraining checkpoint at this link

Few-shot Fine-tuning for Novel Classes

bash ./scripts/vild_sigmoid_ft.sh /path/to/pretraining_ckpt

Test the model on Both Base and Novel Classes

bash ./scripts/vild_sigmoid_test.sh /path/to/ft_ckpt

You should change the checkpoint in each script accordingly to the path in your machine.

Evaluation with pre-trained models

Novel AP	Base AP	Overall AP	download
40.5	60.5	55.2	model

Contacts

If you have any questions about this project, contact via truongvu0911nd@gmail.com or open an issue in this repository

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
configs		configs
mmdet		mmdet
prepare		prepare
requirements		requirements
retrieval		retrieval
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)

Requirements

Preparation

Data

Download COCO proposals

Code structure

Extract the CLIP text embeddings for COCO classes (Optional)

Extract the CLIP visual embeddings on pre-computed proposals

Training and Testing

Pretraining for Base Classes

Few-shot Fine-tuning for Novel Classes

Test the model on Both Base and Novel Classes

Evaluation with pre-trained models

Contacts

About

Releases 1

Packages

Contributors 3

Languages

License

VinAIResearch/LP-OVOD

Folders and files

Latest commit

History

Repository files navigation

Table of contents

PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)

Requirements

Preparation

Data

Download COCO proposals

Code structure

Extract the CLIP text embeddings for COCO classes (Optional)

Extract the CLIP visual embeddings on pre-computed proposals

Training and Testing

Pretraining for Base Classes

Few-shot Fine-tuning for Novel Classes

Test the model on Both Base and Novel Classes

Evaluation with pre-trained models

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages