Skip to content

Latest commit

 

History

History
72 lines (58 loc) · 3.43 KB

README.md

File metadata and controls

72 lines (58 loc) · 3.43 KB

In this work, we propose an efficient Frame-Action Cross-attention Temporal modeling (FACT) framework that (i) performs temporal modeling on frame and action levels in parallel and (ii) leverage this parallelism to achieve iterative bidirectional information transfer between action/frame features and refine them.

We achieve SOTA on four datasets while enjoy lower computational cost.

image

Preparation

1. Install the Requirements

pip3 install -r requirements.txt

2. Prepare Codes

mkdir FACT_actseg
cd FACT_actseg
git clone https://github.com/ZijiaLewisLu/CVPR2024-FACT.git
mv CVPR2024-FACT src
mkdir data 

3. Prepare Data

  • download Breakfast and GTEA data from link1 or link2, and place them in FACT_actseg/data.
  • download EgoProcel and Epic-Kitchens data from here, and place them in FACT_actseg/data.
  • Features for Epic-Kitchens can be downloaded via this script and extracted with utils/extract_epic_kitchen.py.
  • After this, FACT_actseg/data should contain four folders, one for each dataset.

Training

The training is configured using YAML, and all the configurations are listed in configs. You can use the following commands to run the experiments.

cd FACT_actseg
# breakfast
python3 -m src.train --cfg src/configs/breakfast.yaml --set aux.gpu 0 split "split1"
# gtea
python3 -m src.train --cfg src/configs/gtea.yaml --set aux.gpu 0 split "split1"
# egoprocel
python3 -m src.train --cfg src/configs/egoprocel.yaml --set aux.gpu 0 split "split1"
# epic-kitchens
python3 -m src.train --cfg src/configs/epic-kitchens.yaml --set aux.gpu 0 split "split1"

By default, log will be saved to FACT_actseg/log/<experiment-path>. Evaluation results are saved as Checkpoint objects defined utils/evaluate.py. Loss and metrics are also visualized with wandb.

Pre-Trained Models

Pre-trained model weights can be downloaded from here. You can place the files under FACT_actseg/ckpts and test the models with the following command.

python3 -m src.eval

We lost the original data and model weights in a disk failure. These models are replicated afterward, thus the performance slightly varies from those in the papers.

  • Breakfast models
  • GTEA models
  • EgoProceL models
  • Epic-Kitchens models

Citation

@inproceedings{
    lu2024fact,
    title={{FACT}: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation},
    author={Zijia Lu and Ehsan Elhamifar},
    booktitle={Conference on Computer Vision and Pattern Recognition 2024},
    year={2024},
}