Illustrations of the difference between view prior, shape prior, and our model. Unlike the previous methods, beyond the mergence of those two priors, EoRaS utilizes view prior by object-centric learning and further introduces the BEV space where obstruction doesn’t exist, which enables our EoRaS to easily handle complex scenarios.Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
Ke Fan*, Jingshi Lei*, Xuelin Qian†, Miaopeng Yu, Tianjun Xiao†, Tong He, Zheng Zhang, Yanwei Fu
conda create --name EoRaS python=3.9
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
Dataset | checkpoint | ||
---|---|---|---|
MOVI-B | 79.22 | 47.89 | Link |
MOVI-D | 69.44 | 36.96 | Link |
KITTI | 87.07 | 52.00 | Link |
You can download the dataset here:MOVI-B, MOVI-D, KITTI and KITTI's amodal annotation.
savedir=/home/ubuntu/GitLab/Movi-clean/experiments
name=wd_5e-4_full_wobevgt_bid_vislam=1_occlam=0_bevlambda=0_movib_slot8_possincos
python -m torch.distributed.launch --nproc_per_node 4 main_movi.py \
--savedir ${savedir} --name $name \
--vis_lambda 1.0 --occ_lambda 0.0 --bev_lambda 0.0 --wd 5e-4 --epochs 50 --seq_len 12 --lr 1e-5 \
--num_slot 8 --enlarge_coef 2\
--decoder fm+vm --pos_type sincos --dataset movib_bev
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch test_movi.py \
--savedir ${savedir} --name $name \
--param ${savedir}/$name/checkpoint-0050.pth.gz \
--num_slot 8 --enlarge_coef 2 \
--decoder fm+vm --pos_type sincos --dataset movib_bev
savedir=/home/ubuntu/GitLab/Movi-clean/experiments
name=wd_5e-4_full_wobevgt_bid_vislam=1_occlam=0_bevlambda=0_movid_slot8_possincos
python -m torch.distributed.launch --nproc_per_node 4 main_movi.py \
--savedir ${savedir} --name $name \
--vis_lambda 1.0 --occ_lambda 0.0 --bev_lambda 0.0 --wd 5e-4 --epochs 50 --seq_len 12 --lr 1e-5 \
--num_slot 8 --enlarge_coef 2\
--decoder fm+vm --pos_type sincos --dataset movid_bev
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 12345 test_movi.py \
--savedir ${savedir} --name $name \
--param ${savedir}/$name/checkpoint-0050.pth.gz \
--num_slot 8 --enlarge_coef 2 \
--decoder fm+vm --pos_type sincos --dataset movid_bev
savedir=/home/ubuntu/GitLab/Movi-clean/experiments
name=wd_5e-4_full_wobevgt_bid_vislam=1_occlam=0_bevlambda=0_kitti_slot8_possincos
python -m torch.distributed.launch --nproc_per_node 4 main_kitti.py \
--savedir ${savedir} --name $name \
--vis_lambda 1.0 --wd 5e-4 --epochs 51 --seq_len 12 --lr 1e-4 \
--num_slot 8 --enlarge_coef 2 --save_interval 5 \
--decoder fm+vm --pos_type sincos
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch test_kitti.py \
--savedir ${savedir} --name $name \
--param ${savedir}/$name/checkpoint-0050.pth.gz \
--num_slot 8 --enlarge_coef 2 \
--decoder fm+vm --pos_type sincos
If you find our paper useful for your research and applications, please cite using this BibTeX:
@InProceedings{Fan_2023_ICCV,
author = {Fan, Ke and Lei, Jingshi and Qian, Xuelin and Yu, Miaopeng and Xiao, Tianjun and He, Tong and Zhang, Zheng and Fu, Yanwei},
title = {Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {1272-1281}
}
Part of the code are adapted from DETR. The MOVi-B and MOVi-D dataset are re-generated from the Kubric. We modify the camera motion and save the amodal information during the data generation process from the original code. The two datasets share the same license with original MOVi-B and MOVi-D.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.