Skip to content

PyTorch implementation of Transformer-TTS, which can be executed on Google Colab

Notifications You must be signed in to change notification settings

Orca0917/TransformerTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TransformerTTS

Overview

This repository is a PyTorch implementation of a neural network-based speech synthesis model, Transformer-TTS, which uses the Transformer Network. It is based on the code by ChoiHkk's Transformer-TTS and has been trained on the LJSpeech dataset. You can run the notebook in a Google Colab environment.


Model architecture

Transformer architecture


Dataset

The dataset used is the English speech dataset LJSpeech. In the Jupyter notebook, the dataset is utilized without a separate download by using the torchaudio package. The data preprocessing was implemented by referring to the tacotron audio preprocessing by Kyubong Park.


Result

The training was conducted with a batch size of 16 on a total of 13,100 voice datasets for 10 epochs. The result is expressed as a gif showing the predicted mel spectrogram and the ground truth mel spectrogram every 100 steps.

training result


Dependency

torch                            2.3.0+cu121
torchaudio                       2.3.0+cu121
librosa                          0.10.2.post1
numpy                            1.25.2
scipy                            1.11.4
python                           3.10.12

References

About

PyTorch implementation of Transformer-TTS, which can be executed on Google Colab

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published