[CVPRW] 2024 AI City Challenge: Multi-perspective Traffic Video Description Model with Fine grained Refinement Approach

Paper: [To be updated]
Contributor: Tuan-An To, Minh-Nam Tran, Trong-Bao Ho, Thien-Loc Ha, Quang-Tan Nguyen, Hoang-Chau Luong, Thanh-Duy Cao

Framework

Setup

virtualenv venv
scoure venv/bin/activate
pip install -r requirements.txt

Single-view

Preprocessing

Convert to from the WTS to YouCook format.

Convert the competition dataset to YouCook-format annotation files by running the following commands:

python bdd5k2youcook_by_phase.py --videos_dir datasets/BDD_PC_5K/videos --caption_dir datasets/BDD_PC_5K/captions --output_dir annotations/BDD_PC_5K --merge_val

python wts2youcook_by_phase.py --videos_dir datasets/BDD_PC_5K/videos --caption_dir datasets/WTS/captions --output_dir annotations/WTS --merge_val

where the --merge_val flag is used to merge the validation set into the training set, --videos_dir is the path to the video directory, --caption_dir is the path to the caption directory, and --output_dir is the path to the output directory.

The structure of the video directory of BDD_PC_5K should be as follows:

bdd_pc_5k_video_dir /
├── train /
│   └── vehicle_view /
│       ├── video1.mp4
│       ├── video2.mp4
│       └── ...
│   
├── val /
│   └── vehicle_view /
│       ├── video1.mp4
│       ├── video2.mp4
│       └── ...

and the structure of the video directory of WTS should be as follows:

wts_video_dir /
├── train /
│   ├── video_id_1 /
│   │   ├── overhead_view /
│   │   |   ├── abc.mp4
│   │   |   └── ...
│   │   └── vehicle_view /
│   │       ├── abc.mp4
│   │       └── ...
│   ├── video_id_2 /
│   │   └── overhead_view /
│   │       ├── abc.mp4
│   │       └── ...
│   └── ...
│
├── val /
│   ├── video_id_1 /
│   |   ├── overhead_view /
│   |   |   ├── abc.mp4
│   |   |   └── ...
│   |   └── vehicle_view /
│   |       ├── abc.mp4
│   |       └── ...
│   ├── video_id_2 /
│   └── ...

The caption directory structures are similar to the corresponding video directory structures.

Construct a csv file that each row is a <video_path>, <feature_path>. For example: data/video1.mp4, data/video1.npy

cd vid2seq
python extract/make_csv.py
python extract/extract.py --csv <csv_path> --framerate 30

Inference

Go to notebooks/vid2seq_inference.ipynb and construct the checkpoint path and test-set embedded feature path. Then run the notebook.
Go to scripts/get_submission.sh and construct the checkpoint path and test-set embedded feature path.

cd vid2seq
bash ../scripts/get_submission.sh

Training

Upload the train-set embedded feature on Kaggle.
Training notebook on Kaggle: [TRAINING] SINGLE VIEW MODEL.

Motion-Blur

Please check Motion-Blur Link.

Multi-view

Please check Multiview Link.

Acknowledgement

We would like to thank the Vid2Seq repository for their outstanding video captioning.

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
analysis		analysis
assets		assets
bbox_pedestrian		bbox_pedestrian
feat_extractor		feat_extractor
internvid		internvid
motion_blur		motion_blur
multiview/LLaVA		multiview/LLaVA
notebooks		notebooks
scripts		scripts
speed_prediction		speed_prediction
vid2seq		vid2seq
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
bdd5k2youcook.py		bdd5k2youcook.py
download_model.py		download_model.py
ensembler.py		ensembler.py
requirements.txt		requirements.txt
wts2youcook.py		wts2youcook.py
wts2youcook_test.py		wts2youcook_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPRW] 2024 AI City Challenge: Multi-perspective Traffic Video Description Model with Fine grained Refinement Approach

Framework

Setup

Single-view

Preprocessing

Inference

Training

Motion-Blur

Multi-view

Acknowledgement

About

Releases

Packages

Contributors 5

Languages

ToTuanAn/AICityChallenge2024_Track2

Folders and files

Latest commit

History

Repository files navigation

[CVPRW] 2024 AI City Challenge: Multi-perspective Traffic Video Description Model with Fine grained Refinement Approach

Framework

Setup

Single-view

Preprocessing

Inference

Training

Motion-Blur

Multi-view

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages