Paper: Improving Voice Separation by Incorporating End-To-End Speech Recognition
ConvTasNet | Oracle(With Estimated Input) | Oracle(With Target Input) |
---|---|---|
Target1 | Target1 | Target1 |
Target2 | Target2 | Target2 |
Estimated1 | Estimated1 | Estimated1 |
Estimated2 | Estimated2 | Estimated2 |
Mixture | Mixture | Mixture |
ConvTasNet_Model
Oracle_Model
ASR_Model
Install the packages axel, youtube-dl, parallel by using the following commands -
apt-get install axel, youtube-dl, parallel
Install the requirements.txt file by
pip install -r requirements.txt
Download the csv files containing youtube-id of the video
Run the shell script getDataset.sh present in preprocessing as
cd preprocessing
njobs=<num-parallel-download-threads> numdownload=<num-files-to-download> ./getDataset.sh <path-to-csv-file> <output-directory-mp3> <output-directory-wav>
For example -
njobs=20 numdownload=1000000 ./getDataset.sh avspeech_test.csv test_mp3 test_wav
njobs=20 numdownload=1000000 ./getDataset.sh avspeech_train.csv train_mp3 train_wav
Since you may not want to download the entire dataset, you can set the number of audio files you want to download using the numdownload argument.
Inside the ConvTasNet directory, set the config.py variables
Set config.dataSetPath['train'] -> Absolute path of where your train_wav folder is present
Set config.dataSetPath['test'] -> Absolute path of where your test_wav folder is present
Set config.basePath -> '<Path-To-Store-Experiment-Data>/'+str(datetime.now())
Training:
cd ConvTasNet
python main.py train
Testing:
cd ConvTasNet
python main.py test --modelpath "Path to trained model"
Inside the directory ETESpeechRecognition, set the config.py variables
Set config.path_to_download -> Absolute path of where you want to download the LibriSpeech dataset
Set config.base_model_path -> Absolute path of where you want to save the trained model
Set config.cache_dir -> Absolute path of where you want to save the unigram model, etc
Download the dataset
cd ETESpeechRecognition
python downloadDataSet.py
Pre-process
python main.py genunigram
Training:
python main.py train
Testing:
python main.py test
Coming Soon
Coming Soon
CER | CTC Loss | Attention Loss | Avg Loss |
---|---|---|---|
0.5668 | 78.1625 | 49.1855 | 57.8786 |
Method | SI-SNR |
---|---|
ConvTasNet | 9.699 |
Oracle | 13.483 |
Iterative | 10.781 |
For downloading the AVSpeech dataset, the code was modified to download only mp3 with some additional features from the repository, https://github.com/changil/avspeech-downloader.
For training the ASR, the code was modified from the repository, https://github.com/mayank-git-hub/ETE-Speech-Recognition.