the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201

Eleanor456 · 2020-05-30T18:55:28Z

when I generated the audio by the checkpoint with 32000 steps, the output was pure noise. And the alignment pictures are always empty as following. How can I get the result close normal sound which obtained during training.

marianbasti · 2020-06-01T07:42:05Z

What datasets and presets are you using?

Eleanor456 · 2020-06-01T07:44:46Z

您正在使用哪些数据集和预设？

Chinese datasets with 61 speakers, and the preset I have modified according to the deepvoice3_vctk.json

marianbasti · 2020-06-01T07:52:08Z

What frontend selected?
I'm trying to train on spanish speakers and the results are a litte gibberish, but not noise.

Eleanor456 · 2020-06-01T07:58:29Z

What frontend selected?
I'm trying to train on spanish speakers and the results are a litte gibberish, but not noise.

I convert the transcript to pinyin form, so I selected the en frontend. I think the bad result may be the epochs is not enough.

marianbasti · 2020-06-01T08:03:57Z

Shouldn't be so noisy. This is what i get with 40000 steps on 13 speaker dataset.

es frontend, so no phonetics dictionary

Eleanor456 · 2020-06-01T08:10:20Z

Shouldn't be so noisy. This is what i get with 40000 steps on 13 speaker dataset.

es frontend, so no phonetics dictionary

This is the result after training for 61000 steps with batch size of 64.

It is slightly better than before, so I plan to continue training and observe the result.

marianbasti · 2020-06-01T10:26:33Z

Please let me know how well it goes with that batch size

JohnHerry · 2021-04-13T08:23:47Z

The same problem. I am using the MAGICDATA dataset, 1016 speakers, training at 1500,000~2000,000 steps got good result in trainging process. but the inference with these two model got bad speech.
@Eleanor456 Is your model good right now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201

the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201

Eleanor456 commented May 30, 2020

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020 •

edited

Loading

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020 •

edited

Loading

marianbasti commented Jun 1, 2020

JohnHerry commented Apr 13, 2021

the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201

the result obtained by eval_model or synthesis is much worse than which is obtained by train process #201

Comments

Eleanor456 commented May 30, 2020

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020 • edited Loading

marianbasti commented Jun 1, 2020

Eleanor456 commented Jun 1, 2020 • edited Loading

marianbasti commented Jun 1, 2020

JohnHerry commented Apr 13, 2021

Eleanor456 commented Jun 1, 2020 •

edited

Loading

Eleanor456 commented Jun 1, 2020 •

edited

Loading