Skip to content

Speech synthesis with conditioning on very small dataset. Using Nvidia's Tacotron2 and WaveGlow models with Pytorch.

Notifications You must be signed in to change notification settings

isabelleysseric/voice-cloning

Repository files navigation

Voice Cloning




Note: The Voice_cloning_Training_with_Tacotron2_and_WaveGlow.ipynb notebook is to be run in Google Colab. Once in Colab, you need to import the data_cleaned.zip dataset into the current folder /content/. Replace the files in the folder /content/TTS-TT2/filelists/ with my files that have the same name after installing Tacotron2. LThe rest of the code will take care of unzipping it and putting it in the new folder /content/TTS-TT2/wavs/. The program will then ask you to load your transcription file. You will give it the list.txt file.

The files in the input folder are needed to give input to the speech synthesis model. They are also found at the root of the project. The wav files correspond to the zip file: data_cleaned.zip and the list.txt, ljs_audio_text_val_filelists.txt, ljs_audio_text_val_filelists.txt and ljs_audio_text_val_filelists.txtfiles are also found at the root of the project. The files in the output folder are the results of the model, during and after training.

TREE:

input

  • filelists

    • list.txt
    • ljs_audio_text_test_filelists.txt
    • ljs_audio_text_train_filelists.txt
    • ljs_audio_text_val_filelists.txt
  • wavs

    • 1.npy
    • 1.wav
    • ...
    • 60.npy
    • 60.wav

output

  • audio

    • model_BS_6_0.00003_350epoch_0_original_audio.wav

    • model_BS_6_0.00003_350epoch_0_predicted_audio.wav

    • ...

    • model_BS_6_0.00003_350epoch_20_original_audio.wav

    • model_BS_6_0.00003_350epoch_20_predicted_audio.wav

    • model_BS_6_0.00003_350signals_epoch_0.png

    • ...

    • model_BS_6_0.00003_350signals_epoch_20.png

  • images

    • model_BS_6_0.00003_350_Alignment_Epoch_0_Iteration_9_Validation_Loss_1.7767614126205444.png
    • ...
    • model_BS_6_0.00003_350_Alignment_Epoch_20_Iteration_189_Validation_Loss_1.0240533351898193.png
  • logs

    • events.out.tfevents.1703405636.c8a2ca7defbc.1806.11
  • loss

    • model_BS_6_0.00003_350loss_curve_epoch_0.png
    • ...
    • model_BS_6_0.00003_350loss_curve_epoch_22.png
  • spectrogram

    • model_BS_6_0.00003_350spectrograms_epoch_0.png
    • ...
    • model_BS_6_0.00003_350spectrograms_epoch_20.png

Voice_cloning_Training_with_Tacotron2_and_WaveGlow.ipynb
MLSP Presentation_Clonage_de_la_voix.pdf
MLSP_Rapport_Clonage_de_la_voix.pdf
README.md
data_cleaned.zip
list.txt
ljs_audio_text_test_filelist.txt
ljs_audio_text_train_filelist.txt
ljs_audio_text_val_filelist.txt

About

Speech synthesis with conditioning on very small dataset. Using Nvidia's Tacotron2 and WaveGlow models with Pytorch.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published