- Download/Clone the code repository
- Download pre-trained weights here
- Create a folder named weights and extract content of downloaded folder there.
- Install required packages using
pip install -r requirements.txt
- Run the command
python main.py --source samples/test_sample.wav --target samples/trump10.wav
. Source is the file to be converted, and target is the sample target voice.
Model is trained on LibriSpeech ASR corpus. Download the dataset from here
- Encoder training
python encoder_preprocess.py <datasets_root>
python encoder_train.py my_run <datasets_root>/SV2TTS/encoder
- Synthesizer training
python synthesizer_preprocess_audio.py <datasets_root>
python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer
python synthesizer_train.py my_run <datasets_root>/SV2TTS/synthesizer
- Training the vocoder
python vocoder_preprocess.py <datasets_root>
python vocoder_train.py my_run <datasets_root>