By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue.
This repo is the official implementation of BigDetection. It is based on mmdetection and CBNetV2.
We construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes. We show some important statistics of BigDetection in the following figure.
Left: Number of images per category of BigDetection. Right: Number of instances in different object sizes.
We show the evaluation results on BigDetection Validation. We hope BigDetection could serve as a new challenging benchmark for evaluating next-level object detection methods.
Method | mAP (bigdet val) | Links |
---|---|---|
YOLOv3 | 9.7 | model/config |
Deformable DETR | 13.1 | model/config |
Faster R-CNN (C4)* | 18.9 | model |
Faster R-CNN (FPN)* | 19.4 | model |
CenterNet2* | 23.1 | model |
Cascade R-CNN* | 24.1 | model |
CBNetV2-Swin-Base | 35.1 | model/config |
We show the finetuning performance on COCO minival/test-dev. Results show that BigDetection pre-training provides significant benefits for different detector architectures. We achieve 59.8 mAP on COCO test-dev with a single model.
Method | mAP (coco minival/test-dev) | Links |
---|---|---|
YOLOv3 | 30.5/- | config |
Deformable DETR | 39.9/- | model/config |
Faster R-CNN (C4)* | 38.8/- | model |
Faster R-CNN (FPN)* | 40.5/- | model |
CenterNet2* | 45.3/- | model |
Cascade R-CNN* | 45.1/- | model |
CBNetV2-Swin-Base | 59.1/59.5 | model/config |
CBNetV2-Swin-Base (TTA) | 59.5/59.8 | config |
We followed STAC and SoftTeacher to evaluate on COCO for different partial annotation settings.
Method | mAP (1%) | mAP (2%) | mAP (5%) | mAP (10%) |
---|---|---|---|---|
Baseline | 9.8 | 14.3 | 21.2 | 26.2 |
STAC | 14.0 | 18.3 | 24.4 | 28.6 |
SoftTeacher (ICCV 21) | 20.5 | 26.5 | 30.7 | 34.0 |
Ours | 25.3 | 28.1 | 31.9 | 34.1 |
model | model | model | model |
- The models following
*
are implemented on another detection codebase Detectron2. Here we provide the pretrained checkpoints. The results can be reproduced following the installation of CenterNet2 codebase. - Most of models are trained for
8X
schedule on BigDetection. - Most of pretrained models are finetuned for
1X
schedule on COCO. TTA
denotes test time augmentation.- Pre-trained models of Swin Transformer can be downloaded from Swin Transformer for ImageNet Classification.
Ubuntu 16.04
CUDA 10.2
# Create conda environment
conda create -n bigdet python=3.7 -y
conda activate bigdet
# Install Pytorch
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch
# Install mmcv
pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html
# Clone and install
git clone https://github.com/amazon-research/bigdetection.git
cd bigdetection
pip install -r requirements/build.txt
pip install -v -e .
# Install Apex (optinal)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Our BigDetection involves 3 datasets and train/val data can be downloaded from their official website (Objects365, OpenImages v6, LVIS v1.0). All datasets should be placed under $bigdetection/data/ as below. The synsets (total 600 class names) of BigDetection dataset can be downloaded here: bigdetection_synsets. Contact us with lkcai20@fudan.edu.cn to get access to our pre-processed annotation files.
bigdetection/data
└── BigDetection
├── annotations
│ ├── bigdet_obj_train.json
│ ├── bigdet_oid_train.json
│ ├── bigdet_lvis_train.json
│ ├── bigdet_val.json
│ └── cas_weights.json
├── train
│ ├── Objects365
│ ├── OpenImages
│ └── LVIS
└── val
To train a detector with pre-trained models, run:
# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL>
Pre-training
To pre-train a CBNetV2 with a Swin-Base backbone on BigDetection using 8 GPUs, run: (PRETRAIN_MODEL
should be pre-trained checkpoint of Base-Swin-Transformer: model)
tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py 8 \
--cfg-options load_from=<PRETRAIN_MODEL>
To pre-train a Deformable-DETR with a ResNet-50 backbone on BigDetection, run:
tools/dist_train.sh configs/BigDetection/deformable_detr/deformable_detr_r50_16x2_8x_bigdet.py 8
Fine-tuning
To fine-tune a BigDetection pre-trained CBNetV2 (with Swin-Base backbone) on COCO, run: (PRETRAIN_MODEL
should be BigDetection pre-trained checkpoint of CBNetV2: model)
tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py 8 \
--cfg-options load_from=<PRETRAIN_MODEL>
To evaluate a detector with pre-trained checkpoints, run:
tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT> <GPU_NUM> --eval bbox
BigDetection evaluation
To evaluate pre-trained CBNetV2 on BigDetection validation, run:
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py \
<BIGDET_PRETRAIN_CHECKPOINT> 8 --eval bbox
COCO evaluation
To evaluate COCO-finetuned CBNetV2 on COCO validation, run:
# without test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py \
<COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask
# with test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco_tta.py \
<COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask
Other configuration based on Detectron2 can be found at detectron2-probject.
If you use our dataset or pretrained models in your research, please kindly consider to cite the following paper.
@article{bigdetection2022,
title={BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training},
author={Likun Cai and Zhi Zhang and Yi Zhu and Li Zhang and Mu Li and Xiangyang Xue},
journal={arXiv preprint arXiv:2203.13249},
year={2022}
}
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
We thank the authors releasing mmdetection and CBNetv2 for object detection research community.