Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on
- Open-source everything about VoxDet-official
- Open-source Dataset toolkit
- Open-source ROS interface of VoxDet
- Here, also a small demo :)
- Open-source all the other baseline raw results (will be made public after conference)
- Here
- You can use the raw results to verify the numbers in Table 1 and 2 in the paper.
- There are demonstrations below on how to evaluate these results.
This repo is tested under Python 3.7, PyTorch 1.7.1, Cuda 11.0, and mmcv==1.7.1.
This repo is built based on mmdetection.
For evaluation, you also need modified bop_toolkit
You can use the following commands to create conda env with related dependencies.
conda create -n voxdet python=3.7 -y
conda activate voxdet
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
export CUDA_HOME=/usr/local/cuda
pip install mmcv-full==1.7.1
git clone https://github.com/Jaraxxus-Me/VoxDet.git
cd VoxDet
pip install -r requirements.txt
pip install -v -e .
cd ..
git clone https://github.com/Jaraxxus-Me/bop_toolkit.git
cd bop_toolkit
pip install -e .
docker pull bowenli1024/voxdet:ros-v1
In side the docker image:
git clone https://github.com/Jaraxxus-Me/VoxDet.git
cd VoxDet
pip install -v -e .
cd ..
git clone https://github.com/Jaraxxus-Me/bop_toolkit.git
cd bop_toolkit
pip install -e .
We provide the processed OWID, LM-O, YCB-V and RoboTools to reproduce the evaluation.
You can download them and creat data structure like this:
VoxDet
├── mmdet
├── tools
├── configs
├── data
│ ├── BOP
│ │ ├── lmo
| | | ├── test
| | | ├── test_video
│ │ ├── ycbv
│ │ ├── RoboTools
│ ├── OWID
│ │ ├── P1
│ │ ├── P2
You can also compile your custom instance detection dataset using this toolkit, it is very useful :)
Our training set OWID has been released, we provide the code and script here:
# Single-GPU training for the reconstruction stage
bash tools/train.sh
# Multi-GPU training for the base detection, this should already produce the results close to table 1
bash tools/train_dist.sh
# Optional, use ground truth rotation for supervision for (slightly) better result, see table 4 for details
bash tools/train_dist_2.sh
Note: The train_dist.sh
may consume a lot of CPU memory (~150GB), make sure you have enough RAM to avoid OOM problems.
Our trained models and raw results for all the stages are available for download.
Place it under outputs/
and run the following commands to test VoxDet on LM-O and YCB-V datasets.
bash tools/test.sh
By default, the script will only calculate results from the raw .pkl
files, to actually run VoxDet, you need to change the output file name in the command like
# lmo
python3 tools/test.py --config configs/voxdet/${CONFIG}.py --out outputs/$OUT_DIR/lmo1.pkl \
--checkpoint outputs/VoxDet_p2_2/iter_100.pth >> outputs/$OUT_DIR/lmo1.txt
The results will be shown in the .txt
file.
With the raw_result [Method].pkl
, you can directly evaluate them and get the numbers in Table 1 and Table 2 without running the inference again.
# change line 171 in VoxDet_test.py for other datasets
# For example, output the evaluation results for OLN_Corr.
python3 tools/eva_only.py --config configs/voxdet/VoxDet_test.py --out baselines/lmo/oln_corr.pkl
If our work inspires your research, please cite us as:
@INPROCEEDINGS{Li2023vox,
author={Li, Bowen and Wang, Jiashun and Hu, Yaoyu and Wang, Chen and Scherer, Sebastian},
booktitle={Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)},
title={{VoxDet: Voxel Learning for Novel Instance Detection}},
year={2023},
volume={},
number={}
}