By Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl
This repository is an official implementation of the paper NMS Strikes Back.
TL; DR. Detection Transformers with Assignment (DETA) re-introduce IoU assignment and NMS for transformer-based detectors. DETA trains and tests comparibly as fast as Deformable-DETR and converges much faster (50.2 mAP in 12 epochs on COCO).
DETR's one-to-one bipartite matching | Our many-to-one IoU-based assignment |
---|---|
Method | Epochs | COCO val AP |
Total Train time (8 GPU hours) |
Batch Infer Speed (FPS) |
URL |
---|---|---|---|---|---|
Two-stage Deformable DETR | 50 | 46.9 | 42.5 | - | see DeformDETR |
Improved Deformable DETR | 50 | 49.6 | 66.6 | 13.4 | config log model |
DETA | 12 | 50.1 | 16.3 | 12.7 | config log model |
DETA | 24 | 51.1 | 32.5 | 12.7 | config log model |
DETA (Swin-L) | 24 | 62.9 | 100 | 4.2 | config-O365 model-O365 config model |
Note:
- Unless otherwise specified, the model uses ResNet-50 backbone and training (ResNet-50) is done on 8 Nvidia Quadro RTX 6000 GPU.
- Inference speed is measured on Nvidia Tesla V100 GPU.
- "Batch Infer Speed" refer to inference with batch size = 4 to maximize GPU utilization.
- Improved DeformableDETR implements two-stage Deformable DETR with improved hyperparameters (e.g. more queries, more feature levels, see full list here).
- DETA with Swin-L backbone is pretrained on Object-365 and fine-tuned on COCO. This model attains 63.5AP on COCO test-dev. Times refer to fine-tuning (O365 pre-training takes 14000 GPU hours). We additionally provide the pre-trained Object365 config and model prior to fine-tuning.
Please follow instructions from Deformable-DETR for installation, data preparation, and additional usage examples. Tested on torch1.8.0+cuda10.1 and torch1.6.0+cuda9.2 and torch1.11.0+cuda11.3
You can evaluate our pretrained DETA models from the above table on COCO 2017 validation set:
./configs/deta.sh --eval --coco_path ./data/coco --resume <path_to_model>
You can also run distributed evaluation:
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh \
--eval --coco_path ./data/coco --resume <path_to_model>
You can also run distributed evaluation on our Swin-L model:
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta_swin_ft.sh \
--eval --coco_path ./data/coco --resume <path_to_model>
Training DETA on 8 GPUs:
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh --coco_path ./data/coco
If you are using slurm cluster, you can simply run the following command to train on 1 node with 8 GPUs:
GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 8 configs/deta.sh \
--coco_path ./data/coco
Fine-tune DETA with Swin-L on 2 nodes of each with 8 GPUs:
GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 16 configs/deta_swin_ft.sh \
--coco_path ./data/coco --finetune <path_to_o365_model>
This project builds heavily off of Deformable-DETR and Detectron2. Please refer to their original licenses for more details. If you are using Swin-L backbone, please see Swin original license.
If you find DETA useful in your research, please consider citing:
@article{ouyangzhang2022nms,
title={NMS Strikes Back},
author={Ouyang-Zhang, Jeffrey and Cho, Jang Hyun and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv preprint arXiv:2212.06137},
year={2022}
}