- Introduction
- Benchmark
- Dataset Preparation
- Pretrain Weights
- How to test
- How to train
- Dependencies
- Citation
- License
This is the repository of An Efficient Software for Building Lip Reading Models Without Pains. In this repository, we provide a deep lip reading pipeline as well as pre-trained models and training settings. We evaluate our pipeline on LRW Dataset and LRW1000 Dataset. We obtain 88.4% and 56.0% on LRW and LRW-1000, respectively. The results are comparable and even surpass current state-of-the-art results. Especially, we reach the current state-of-the-art result (56.0%) on LRW-1000 Dataset.
Year | Method | LRW | LRW-1000 |
---|---|---|---|
2017 | Chung et al. | 61.1% | 25.7% |
2017 | Stafylakis et al. | 83.5% | 38.2% |
2018 | Stafylakis et al. | 88.8% | - |
2019 | Yang et at. | - | 38.19% |
2019 | Wang et al. | 83.3% | 36.9% |
2019 | Weng et al. | 84.1% | - |
2020 | Luo et al. | 83.5% | 38.7% |
2020 | Zhao et al. | 84.4% | 38.7% |
2020 | Zhang et al. | 85.0% | 45.2% |
2020 | Martinez et al. | 85.3% | 41.4% |
2020 | Ma et al. | 87.7% | 43.2% |
2020 | ResNet18 + BiGRU (Baseline + Cosine LR) | 85.0% | 47.1% |
2020 | ResNet18 + BiGRU (Baseline with word boundary + Cosine LR) | 87.5% | 55.0% |
2020 | Our Method | 86.2% | 48.3% |
2020 | Our Method (with word boundary) | 88.4% | 56.0% |
- Download LRW Dataset and LRW1000 Dataset and link
lrw_mp4
andLRW1000_Public
in the root of this repository:
ln -s PATH_TO_DATA/lrw_mp4 .
ln -s PATH_TO_DATA/LRW1000_Public .
- Run
scripts/prepare_lrw.py
andscripts/prepare_lrw1000.py
to generate training samples of LRW and LRW-1000 Dataset respectively:
python scripts/prepare_lrw.py
python scripts/prepare_lrw1000.py
The mouth videos, labels, and word boundary information will be saved in the .pkl
format. We pack image sequence as jpeg
format into our .pkl
files and decoding via PyTurboJPEG. If you want to use your own dataset, you may need to modify the utils/dataset.py
file.
We provide pretrained weight on LRW/LRW-1000 dataset for evaluation. For smaller datasets, the pretrained weights can be provide a good start point for feature extraction, finetuning, and so on.
Link of pretrained weights: Baidu Yun (code: ivgl)
If you can not access to provided links, please email dalu.feng@vipl.ict.ac.cn or fengdalu@gmail.com.
To test our provided weights, you should download weights and place them in the root of this repository.
For example, to test baseline on LRW Dataset:
python main_visual.py \
--gpus='0' \
--lr=0.0 \
--batch_size=128 \
--num_workers=8 \
--max_epoch=120 \
--test=True \
--save_prefix='checkpoints/lrw-baseline/' \
--n_class=500 \
--dataset='lrw' \
--border=False \
--mixup=False \
--label_smooth=False \
--se=False \
--weights='checkpoints/lrw-cosine-lr-acc-0.85080.pt'
To test our model in LRW-1000 Dataset:
python main_visual.py \
--gpus='0' \
--lr=0.0 \
--batch_size=128 \
--num_workers=8 \
--max_epoch=120 \
--test=True \
--save_prefix='checkpoints/lrw-1000-final/' \
--n_class=1000 \
--dataset='lrw1000' \
--border=True \
--mixup=False \
--label_smooth=False \
--se=True \
--weights='checkpoints/lrw1000-border-se-mixup-label-smooth-cosine-lr-wd-1e-4-acc-0.56023.pt'
For example, to train lrw baseline:
python main_visual.py \
--gpus='0,1,2,3' \
--lr=3e-4 \
--batch_size=400 \
--num_workers=8 \
--max_epoch=120 \
--test=False \
--save_prefix='checkpoints/lrw-baseline/' \
--n_class=500 \
--dataset='lrw' \
--border=False \
--mixup=False \
--label_smooth=False \
--se=False
Optional arguments:
gpus
: the GPU id used for traininglr
: learning rate. By default, we automatically applied the Linear Scale Rule in code (e.g., lr=3e-4 for 4 GPUs x 32 video/gpu and lr=1.2e-3 for 8 GPUs x 128 video/gpu). We recommend lr=3e-4 for 32 video/gpu when training from scratch. You need to modify the learning rate based on your setting.batch_size
: batch sizenum_workers
: the number of processes used for data loadingmax_epoch
: the maximum epochs in trainingtest
: The test mode. When using this mode, the program will only test once and exit.weights
(optional): The path of pre-trained weight. If this option is specified, the model will load the pre-trained weights by the given location.save_prefix
: the save prefix of model parametersn_class
: the number of total word classesdataset
: the dataset used for training and testing, onlylrw
andlrw1000
are supported.border
: use word boundary indicated variable for training and testingmixup
: use mixup in traininglabel_smooth
: use label_smooth in trainingse
: use se module in ResNet-18
More training details and setting can be found in our paper. We plan to include more pretrained models in the future.
- PyTorch 1.6
- opencv-python
- TurboJPEG and PyTurboJPEG
If you find this code useful in your research, please consider to cite the following papers:
@inproceedings{feng2021efficient,
title={An Efficient Software for Building LIP Reading Models Without Pains},
author={Feng, Dalu and Yang, Shuang and Shan, Shiguang},
booktitle={2021 IEEE International Conference on Multimedia \& Expo Workshops (ICMEW)},
pages={1--2},
year={2021},
organization={IEEE}
}
@article{feng2020learn,
author = "Feng, Dalu and Yang, Shuang and Shan, Shiguang and Chen, Xilin",
title = "Learn an Effective Lip Reading Model without Pains",
journal = "arXiv preprint arXiv:2011.07557",
year = "2020",
}
The MIT License