-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201
Comments
What datasets and presets are you using? |
Chinese datasets with 61 speakers, and the preset I have modified according to the deepvoice3_vctk.json |
What frontend selected? |
I convert the transcript to pinyin form, so I selected the en frontend. I think the bad result may be the epochs is not enough. |
Please let me know how well it goes with that batch size |
The same problem. I am using the MAGICDATA dataset, 1016 speakers, training at 1500,000~2000,000 steps got good result in trainging process. but the inference with these two model got bad speech. |
when I generated the audio by the checkpoint with 32000 steps, the output was pure noise. And the alignment pictures are always empty as following. How can I get the result close normal sound which obtained during training.
The text was updated successfully, but these errors were encountered: