This repo implements {fomaml, reptile, multi-task} pretraining interface for end-to-end ASR, and provides well-organized experiment flow.
Note: This repo is aligned with the paper, but apply meta learning on cross-accent setting. Since the corpus used in the paper (on cross-language setting) is not free (IARPA-BABEL).
- torch==1.4.1
- numpy==1.18.3
- sentencepiece==0.1.85 (for subword units)
- comet-ml==3.1.6, and you should register using ntu email
- editdistance==0.5.3
- tqdm-logger==0.3.0 repo
- torchexp==0.1.0 repo
- torch_optimizer==0.0.1a11 repo
Download corpus -> ask me- If you are the member of NTU speech lab, ask me.
- If not, please download the corpus from Mozilla Common Voice Project, I'll add how to pre-process it after I finished my master thesis lol. Basically, I used the recipe in espnet for pre-processing (but should modify a little bit for adding ACCENT label), then use the the script to extract kaldi-format files into one numpy file (with memmap, to save RAM usage).
- modify the
COMET_PROJECT_NAME
,COMET_WORKSPACE
insrc/marcos.py
to your setting
- One exp: (Pretraining) -> training (
train.py
) -> testing/decoding (train.py --test
) ->score.sh
- Each trainer will instantialized with different interface to decide its training/pretraining behavior
train.py
:- mono-accent training (include training from scratch or fine-tune on pretrained model)
- testing (aka decoding): add
--test
flag
pretrain.py
: multi-accent pretraining (thru meta/multi-task)- excute
run_foo.sh
(it will callfoo_full_exp.sh
to conduct one complete experiment) to conduct large-scale experiments on battleship score.sh
will calltranslate.py
with proper env. variables to excute sctk to evaluate the error ratedata/
: soft link to dataconfig/
: config yamltesting-logs/
: Can also modify the name insrc/marcos.py
pretrain
: store info when excutingpretrain.py
evaluation
: store info when excutingtrain.py
tensorboard
: as title (but NOTE that we use comet.ml to track the experiment, most of the time we don't need this)
- main files
foo_trainer.py
: define how to run one batch and some specific stuff for this model, will includeasr_model
insidetester.py
: define how to decode (all model use same tester)train_interface.py
andmono_interface.py
: used in mono-accent training and fine-tuningpretrain_interface.py
andfo_meta_interface.py
,multi_interface.py
...: used in pretraining
modules/
: define some components in model (but a little bit deprecated now 😅)io/
:data_loader
: used in mono-accentdata_container
used in multi-accent
model/
:- transformer (NOTE:
transformer_pytorch
) is what we use - blstm (vgg-blstm fully-connected ctc model used in MetaASR)
- transformer (NOTE:
monitor
:dashboard.py
(comet.ml),tb_dashboard
(tensorboard)logger.py
: tqdm_logger loggingmetric.py
: how to calculate error rate
Feel free to use/modify the code, any bug report or improvement suggestion will be appreciated. If you find this project helpful for your research, please do consider to cite our paper, thanks!
@inproceedings{hsu2020meta,
title={Meta learning for end-to-end low-resource speech recognition},
author={Hsu, Jui-Yang and Chen, Yuan-Jui and Lee, Hung-yi},
booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7844--7848},
year={2020},
organization={IEEE}
}