This repository contains the official source code and data for our AFFT paper. If you find our code or paper useful, please consider citing:
Z. Zhong, D. Schneider, M. Voit, R. Stiefelhagen and J. Beyerer. Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation. In WACV, 2023.
@InProceedings{Zhong_2023_WACV,
author = {Zhong, Zeyun and Schneider, David and Voit, Michael and Stiefelhagen, Rainer and Beyerer, J\"urgen},
title = {Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2023},
pages = {6068-6077}
}
First clone the repo and set up the required packages in a conda environment.
$ git clone https://github.com/zeyun-zhong/AFFT.git
$ conda env create -f environment.yaml python=3.7
$ conda activate afft
AFFT works on pre-extracted features, so you will need to download the features first. You can download the TSN-features from RULSTM for EK100 and for EGTEA Gaze+. The RGB-Swin features are available here and audio features are available here.
Please make sure that your data structure follows the structure shown below. Note that
dataset_root_dir
in config.yaml should be changed to your specific data path.
Dataset root path (e.g., /home/user/datasets)
├── epickitchens100
│ └── features
│ │── rgb
│ │ └── data.mdb
│ │── rgb_omnivore
│ │ └── data.mdb
│ │── obj
│ │ └── data.mdb
│ │── audio
│ │ └── data.mdb
│ └── flow
│ └── data.mdb
└── egtea
└── features
│── TSN-C_3_egtea_action_CE_s1_rgb_model_best_fcfull_hd
│ └── data.mdb
│── TSN-C_3_egtea_action_CE_s1_flow_model_best_fcfull_hd
│ └── data.mdb
│── TSN-C_3_egtea_action_CE_s2_rgb_model_best_fcfull_hd
│ └── data.mdb
│── TSN-C_3_egtea_action_CE_s2_flow_model_best_fcfull_hd
│ └── data.mdb
│── TSN-C_3_egtea_action_CE_s3_rgb_model_best_fcfull_hd
│ └── data.mdb
└── TSN-C_3_egtea_action_CE_s3_flow_model_best_fcfull_hd
└── data.mdb
If you use a different organization, you would need to edit rulstm_feats_dir
in EK100-common
and EGTEA-common.
Dataset | Modalities | Performance (Actions) |
Config | Model |
---|---|---|---|---|
EK100 | R-Swin, O, AU, F R-TSN, O, AU, F R-TSN, O, F |
18.5 (MT5R) 17.0 (MT5R) 16.4 (MT5R) |
expts/01_SA-Fuser_ek100_val_Swin.txt expts/01_SA-Fuser_ek100_val_TSN.txt expts/01_SA-Fuser_ek100_val_TSN_wo_audio.txt |
link link link |
EGTEA | RGB-TSN, Flow | 42.5 (Top-1) | expts/02_ek100_avt_tsn.txt |
link |
Recall that dataset_root_dir
in config.yaml should be changed to your specific path.
python run.py -c expts/01_SA-Fuser_ek100_train.txt --mode train --nproc_per_node 2
python run.py -c expts/06_SA-Fuser_egtea_train.txt --mode train --nproc_per_node 2
python run.py -c expts/01_SA-Fuser_ek100_val_TSN_wo_audio.txt --mode test --nproc_per_node 1
python run.py -c expts/06_SA-Fuser_egtea_val.txt --mode test --nproc_per_node 1
# save logits python run.py -c expts/01_SA-Fuser_ek100_test_TSN_wo_audio.txt --mode test --nproc_per_node 1 # generate test / challenge file python challenge.py --prefix_h5 test --models fusion_ek100_tsn_wo_audio_4h_18s --weights 1.
This codebase is released under the license terms specified in the LICENSE file. Any imported libraries, datasets or other code follows the license terms set by respective authors.
Many thanks to Rohit Girdhar and Antonino Furnari for providing their code and data.