Beyond MOT: Semantic Multi-Object Tracking
Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan*, Libo Zhang*
European Conference on Computer Vision (ECCV), 2024. (*equal advising and co-last author)
arXiv
Dataset
Figure: Illustration of the proposed Semantic SMOT. Existing multi-object tracking (MOT) focusing on predicting trajectories only (see (a)) and our semantic multi-object tracking (SMOT) aiming at estimating trajectories and understanding their semantics (see (b)). Best viewed in color for all figures.
Figure: : Illustration of the proposed approach SMOTer, which contains three components of trajectory estimation for tracking, feature fusion, and trajectory-associated semantic understanding.
- Linux or macOS with python >= 3.8
- Pytorch >= 1.10.0: this configuration is suitable for our V100 server, and theoretically, the PyTorch version only needs to be higher than 1.8.0.
- Detectron2: follow its official instructions.
conda create -n somter python=3.8.0
conda activate somter
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
# under your working directory
git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
pip install -e .
cd ..
# or git clone https://github.com/Nathan-Li123/SMOTer
git clone https://github.com/HengLan/SMOT
# or cd SMOTer
cd SMOT
pip install -r requirements.txt
- Before starting the processing, please download BenSMOT from here (baidu: yb2d, one dirve) and place it anywhere you wish to. For more details about BenSMOT, please refer to BenSMOT.md.
- In the BenSMOT dataset folder, we provide semantic annotation files for each sequence in the dataset, including video captions, trajectory captions, and trajectory interactions. For convenience, we recommend downloading the combined annotation files from here (baidu: 1b2h, one drive).
- Sim-link the test set of BenSMOT to
datasets/bensmot/BenSMOT-val/
, and construct them as follows.
datasets
├── bensmot
| └──annotations
| └──seqmaps
| └──BenSMOT-val
| └──instance_captioin.json
| └──video_summary.json
| └──relation.json
- Modify the
DATA_PATH
intools/convert_bensmot2coco.py
to the BenSMOT root directory you are using. - run
tools/convert_bensmot2coco.py
to createtrain.json
andtest.json
files in theannotations
folder, and create atest.txt
file in theseqmaps
folder.
Please use the scripts provided in scripts/bensmot.sh
for training and evaluation. The weights files used in the process can be downloaded here (one drive).
# train
CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4 --config-file configs/BYTE_BENSMOT_FPN.yaml
# evaluation
CUDA_VISIBLE_DEVICES=0 python train_net.py --num-gpus 1 --config-file configs/BYTE_BENSMOT_FPN.yaml --eval-only path/to/weight
# count metrics
python eval_vu.py
Our code repository is built upon xingyizhou/GTR. Thanks for their wonderful work.
If you find this project useful for your research, please use the following BibTeX entry.
@inproceedings{li2024beyond,
title={Beyond MOT: Semantic Multi-Object Tracking},
author={Li, Yunhao and Wang, Hao and Ma, Xue and Yao, Jiali and Dong, Shaohua and Fan, Heng and Zhang, Libo},
booktitle={ECCV},
year={2024}
}