We have made lots of updates in 2021.2, Please pull the commits to get the newest code and data.
- Pedestrians and cyclists are supported.
- New pseudo-GT data is released.
- Pseudo-GT for pedestrians and cyclists are released.
- Pseudo-GT for cars are updated. We add background disparity using depth completion for more robust training for the iDispNet, especially near the edge of segmentation mask.
- New trained models are released to produce better results.
- Easy download scripts are provided using gdown.
-
Setup KITTI object dataset to the following folder structure
disprcnn/ # project root ├─ data │ ├─ kitti │ │ ├─ object │ │ │ ├─ split_set │ │ │ ├─ train_set.txt & val_set.txt & test_set.txt & training_set.txt(same as train.txt) # todo │ │ │ ├─ training │ │ │ ├─ calib & label_2 & label_3 & image_2 & image_3 │ │ │ ├─ testing │ │ │ ├─ calib & image_2 & image_3
-
Download label_3, segmentation mask and disparity pseudo-ground-truth. As described in the paper, we generate two types of pseudo-ground-truth: with and without the LiDAR points. These two variants of pseudo-GT correspond to vob and pob in the following experiments, respectively.
# download label_3 sh scripts/download/data/label_3.sh # download pseudo-GT sh scripts/download/data/pseudo_gt.sh
As an example, we describe steps to run the vob version for the car category here.
Other categories and versions are similar.
Note that the current implementation cannot exit itself automatically, see Notes for details.
-
Define the number of GPUs.
export NGPUS=8
-
2D detector.
For the car category, download pretrained Stereo R-CNN in Mask R-CNN format.
sh scripts/download/model/srcnn_pretrained_2d_mrcnn_format.sh
Then train the Stereo Mask R-CNN with the following command.
sh scripts/car/vob/train_smrcnn.sh # This step cost ~1.5 hours using 4 GPUs.
For the pedestrian and cyclist categories, we provide 2D predictions, you can download them instead of training by yourself.
sh scripts/download/model/pedestrian_2d.sh sh scripts/download/model/cyclist_2d.sh
-
train iDispNet.
Downloadpretrained_model_KITTI2015.tar
sh scripts/download/model/pretrained_psmnet.sh
We use the fast.ai framework to train the iDispNet as in
train_idispnet_fa.py
.sh scripts/car/vob/train_idispnet.sh # This step cost ~8 hours using 8 GPUs.
-
train RPN
sh scripts/car/vob/train_rpn.sh # This step cost ~5 hours using 8 GPUs.
-
train RCNN
sh scripts/car/vob/train_rcnn.sh # This step cost ~13 hours using 8 GPUs.
-
evaluate RCNN
sh scripts/eval_rcnn.sh # This step cost ~2min using 8 GPUs.
The released models are trained on the training split of the KITTI object training set.
As an example, we describe steps to run the vob version for the car category here. Other categories are similar.
When you have multiple GPUs with more than 12G memory (e.g. RTX TITAN/V100), run the first command to perform distributed inference on multiple GPUs. Otherwise, you are recommended to use only one GPU.
# download pretrained model.
sh scripts/download/model/pretrained_car_vob.sh
# distributed inference with multiple GPUs.
export NGPUS=8
sh scripts/car/vob/eval_with_trained_model.sh
# inference with one GPU.
export NGPUS=1
sh scripts/car/vob/eval_with_trained_model.sh
For pedestrians and cyclists, we provide 2D predictions instead of trained 2D detector. Download them using the following command.
sh scripts/download/model/pedestrian_2d.sh
sh scripts/download/model/cyclist_2d.sh
We provide predictions.pth for the car category in case anyone cannot perform inference. Download them using the following commands.
sh scripts/download/predictions/car_vob.sh
sh scripts/download/predictions/car_pob.sh
Evaluate using the provided prediction.pth using commands like this
python tools/test_net.py --config-file xxxx/rcnn.yaml --no_force_recompute
The '—no_force_recompute' flag will enable the engine to load predictions.pth instead of performing inference from scratch.
You can also use visualize.ipynb to visualize the 3D bounding boxes and instance disparities.
-
Setting num_workers>0 could speed up training and inference by a large margin. However, due to some bugs in PyTorch, the training process will hang itself after finishing nearly all the iterations. You should interrupt the program manually by "Ctrl+C" when training stucks at the end. You can validate if the training is finished by checking the ETA (usually less than one minute) or iter (close to max_iter in the configs). If the GPU memory is not released after manual interrupting, use the following command to release GPU memory. This imperfection can be potentially fixed via switching the distributed launcher to
mp.spawn
.pkill -e -9 python -u $USER
If you encouter EOFError or share_memory error during training process using num_workers>0, just relaunch the training scripts with "SOLVER.LOAD_OPTIMIZER True SOLVER.LOAD_SCHEDULER True".
-
We suggest to use 8 GPUs with more than 12G memory each. If you don't have enough GPUs or your GPU memory is less than 12G, there are some alternatives. We provide a script to run inference with one GPU. For training, you can decrease the batch size and learning rate, and increase maximum iteration following scheduling rules from Detectron.