ASR for Low-resource languages

Overview

This project aims to perform automatic speech recognition for low-resource languages. To do so we fine-tuned wav2vec2-xls-r (Babu et al., 2021) on labeled speech data from different languages from the Mozilla Common Voice dataset (Ardila et al., 2020).

The scripts in this repo allow to:

For our purpose we use characters as speech units. The tokenizers contain the vocabularies for the langauges:

To fine-tune wav2vec2-xls-r we use the tokenizer of the language as decoder layer to train the model.

To fine-tune the pre-trained model wav2vec2-xls-r on the target language see the Notebook "Notebook_fine_tuning_wav2vec2_xls_r"
To create a bilingual model by fine-tuning the pre-trained model refer to the the Notebook "Notebook_bilingual_fine_tuning"
To compute inferences on the fine-tuned model use the Notebook "Notebook_inference.ipynb"
in the second part of the notebook is explained how to compute the inferences using a Language Model

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
images		images
notebooks		notebooks
tokenizer_ar		tokenizer_ar
tokenizer_gl		tokenizer_gl
tokenizer_ita		tokenizer_ita
tokenizer_rm-vallader		tokenizer_rm-vallader
Inference_with_LM.py		Inference_with_LM.py
README.md		README.md
fine_tune_bilingual.py		fine_tune_bilingual.py
fine_tune_wav2vec2-xls-r.py		fine_tune_wav2vec2-xls-r.py
inferences_generic.py		inferences_generic.py