This repository implements the model proposed in the paper:
Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021
[arXiv paper] [IEEE Xplore paper]
When using this code, kindly reference:
@ARTICLE{Kazakos2021SlowFastAuditory,
title={Slow-Fast Auditory Streams For Audio Recognition},
author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
journal = {CoRR},
volume = {abs/2103.03516},
year = {2021},
ee = {https://arxiv.org/abs/2103.03516},
}
You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:
- Slow-Fast (EPIC-KITCHENS-100) link
- Slow (EPIC-KITCHENS-100) link
- Fast (EPIC-KITCHENS-100) link
- Slow-Fast (VGG-Sound) link
- Slow (VGG-Sound) link
- Fast (VGG-Sound) link
- Requirements:
- Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
- VGG-Sound:
- EPIC-KITCHENS:
- From the annotation repository of EPIC-KITCHENS-100 (link), download:
EPIC_100_train.pkl
,EPIC_100_validation.pkl
, andEPIC_100_test_timestamps.pkl
.EPIC_100_train.pkl
andEPIC_100_validation.pkl
will be used for training/validation, whileEPIC_100_test_timestamps.pkl
can be used to obtain the scores to submit in the AR challenge. - Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
- Extract audio from the videos by running:
python audio_extraction/extract_audio.py /path/to/videos /output/path
- Save audio in HDF5 format by running:
python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
- From the annotation repository of EPIC-KITCHENS-100 (link), download:
To train the model run (fine-tuning from VGG-Sound pretrained model):
python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model
To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model
.
You can also train the individual streams. For example, for training Slow run:
python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model
To validate the model run:
python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth
To obtain scores on the test set run:
python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test
To train the model run:
python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations
To validate the model run:
python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth
The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.