CBNet: A Composite Backbone Network Architecture for Object Detection

By Tingting Liang*, Xiaojie Chu*, Yudong Liu*, Yongtao Wang, Zhi Tang, Wei Chu, Jingdong Chen, Haibin Ling.

This repo is the official implementation of CBNetV2. It is based on mmdetection and Swin Transformer for Object Detection.

Contact us with tingtingliang@pku.edu.cn, chuxiaojie@stu.pku.edu.cn, wyt@pku.edu.cn.

Update

2024/10/21: Update code for CB-EVA-02-L. We achieve new SOTA instance segmentation results (55.5 -> 56.1 mask AP) on COCO!

Introduction

CBNetV2 achieves strong single-model performance on COCO object detection (60.1 box AP and 52.3 mask AP on test-dev) without extra training data.

Partial Results and Models

More results and models can be found in model zoo

Faster R-CNN

Backbone	Lr Schd	box mAP (minival)	#params	FLOPs	config	log	model
DB-ResNet50	1x	40.8	69M	284G	config	github	github

Mask R-CNN

Backbone	Lr Schd	box mAP (minival)	mask mAP (minival)	#params	FLOPs	config	log	model
DB-Swin-T	3x	50.2	44.5	76M	357G	config	github	github

Cascade Mask R-CNN (1600x1400)

Backbone	Lr Schd	box mAP (minival/test-dev)	mask mAP (minival/test-dev)	#params	FLOPs	config	model
DB-Swin-S	3x	56.3/56.9	48.6/49.1	156M	1016G	config	github

Improved HTC (1600x1400)

We use ImageNet-22k pretrained checkpoints of Swin-B and Swin-L. Compared to regular HTC, our HTC uses 4conv1fc in bbox head.

Backbone	Lr Schd	box mAP (minival/test-dev)	mask mAP (minival/test-dev)	#params	FLOPs	config	model
DB-Swin-B	20e	58.4/58.7	50.7/51.1	235M	1348G	config	github
DB-Swin-L	1x	59.1/59.4	51.0/51.6	453M	2162G	config (test only)	github
DB-Swin-L (TTA)	1x	59.6/60.1	51.8/52.3	453M	-	config (test only)	github

TTA denotes test time augmentation.

EVA02 (1536x1536)

Backbone	Lr Schd	mask mAP (test-dev)	#params	config	model
DB-EVA02-L	1x	56.1	674M	config	HF

Notes:

Pre-trained models of Swin Transformer can be downloaded from Swin Transformer for ImageNet Classification.
Pre-trained models of EVA02 can be downloaded from EVA02 pretrain.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing (w/o segm result)
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox 

# multi-gpu testing (w/ segm result)
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM>

For example, to train a Faster R-CNN model with a Duel-ResNet50 backbone and 8 gpus, run:

# path of pre-training model (resnet50) is already in config
tools/dist_train.sh configs/cbnet/faster_rcnn_cbv2d1_r50_fpn_1x_coco.py 8

Another example, to train a Mask R-CNN model with a Duel-Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/cbnet/mask_rcnn_cbv2_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL>

Apex (optional):

Following Swin Transformer for Object Detection, we use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Documents and Tutorials

We list some documents and tutorials from MMDetection, which may be helpful to you.

Citation

If you use our code/model, please consider to cite our paper CBNet: A Composite Backbone Network Architecture for Object Detection.

@ARTICLE{9932281,
  author={Liang, Tingting and Chu, Xiaojie and Liu, Yudong and Wang, Yongtao and Tang, Zhi and Chu, Wei and Chen, Jingdong and Ling, Haibin},
  journal={IEEE Transactions on Image Processing}, 
  title={CBNet: A Composite Backbone Network Architecture for Object Detection}, 
  year={2022},
  volume={31},
  pages={6893-6906},
  doi={10.1109/TIP.2022.3216771}}

License

The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 1,696 Commits
.dev_scripts		.dev_scripts
.github		.github
EVA		EVA
configs		configs
demo		demo
docker		docker
docs		docs
figures		figures
mmcv_custom		mmcv_custom
mmdet		mmdet
requirements		requirements
resources		resources
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model-index.yml		model-index.yml
model_zoo.md		model_zoo.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CBNet: A Composite Backbone Network Architecture for Object Detection

Update

Introduction

Partial Results and Models

Faster R-CNN

Mask R-CNN

Cascade Mask R-CNN (1600x1400)

Improved HTC (1600x1400)

EVA02 (1536x1536)

Usage

Installation

Inference

Training

Apex (optional):

Documents and Tutorials

Citation

License

Other Links

About

Releases

Packages

Contributors 176

Languages

License

VDIGPKU/CBNetV2

Folders and files

Latest commit

History

Repository files navigation

CBNet: A Composite Backbone Network Architecture for Object Detection

Update

Introduction

Partial Results and Models

Faster R-CNN

Mask R-CNN

Cascade Mask R-CNN (1600x1400)

Improved HTC (1600x1400)

EVA02 (1536x1536)

Usage

Installation

Inference

Training

Apex (optional):

Documents and Tutorials

Citation

License

Other Links

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 176

Languages

Packages