Official Tensorflow Implementation of the AAAI-2020 paper Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction by Jingwen Wang et al.
@inproceedings{wang2020temporally,
title={Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction},
author={Wang, Jingwen and Ma, Lin and Jiang, Wenhao},
booktitle={AAAI},
year={2020}
}
pip install -r requirements.txt
- Download Glove word embedding data.
cd download/
sh download_glove.sh
- Download dataset features.
TACoS: BaiduDrive, GoogleDrive
Charades-STA: BaiduDrive, GoogleDrive
ActivityNet-Captions: BaiduDrive, GoogleDrive
Put the feature hdf5 file in the corresponding directory ./datasets/{DATASET}/features/
We decode TACoS/Charades videos using fps=16
and extract C3D (fc6) features for each non-overlap 16-frame snippet. Therefore, each feature corresponds to 1-second snippet. For ActivityNet, each feature corresponds to 2-second snippet. To extract C3D fc6 features, I mainly refer to this code.
- Download trained models.
Download and put the checkpoints in corresponding ./checkpoints/{DATASET}/
.
- Data Preprocessing (Optional)
cd datasets/tacos/
sh prepare_data.sh
Then copy the generated data in ./data/save/
.
Use correspondig scripts for preparing data for other datasets.
You may skip this procedure as the prepared data is already saved in ./datasets/{DATASET}/data/save/
.
sh scripts/test_tacos.sh
sh scripts/eval_tacos.sh
Use corresponding scripts for testing or evaluating for other datasets.
The predicted results are also provided in ./results/{DATASET}/
.
CBP | R@1,IoU=0.7 | R@1,IoU=0.5 | R@5,IoU=0.7 | R@5,IoU=0.5 | mIoU |
---|---|---|---|---|---|
TACoS | 18.54 | 23.19 | 24.88 | 35.83 | 20.46 |
Charades | 17.98 | 36.21 | 50.27 | 70.51 | 35.70 |
ActivityNet | 18.74 | 36.83 | 49.84 | 67.78 | 37.98 |
sh scripts/train_tacos.sh
Use corresponding scripts for training for other datasets.
- The checkpoints for Charades dataset have been re-uploaded.