Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
- Set
data_path
inhparams.py
as the LJSpeech folder - Set
teacher_dir
inhparams.py
as the data directory where the alignments and melspectrogram targets are saved - Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py
The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention
You can hear the audio samples here