vocoder finetune支持多人的问题？ #1268

jerryuhoo · 2022-01-05T02:05:51Z

jerryuhoo
Jan 5, 2022

在finetune的时候需要先生成预测的mel，但是好像目前只支持baker的单人数据集，如果需要支持多人是不是需要加入参数--speaker-dict，在这行代码中加入spk_num。

PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/gen_gta_mel.py

Lines 45 to 46 in 36c9eaa

    
           model = FastSpeech2( 
        
               idim=vocab_size, odim=odim, **fastspeech2_config["model"])

在这行代码中加入spk_id

PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/gen_gta_mel.py

Line 99 in 36c9eaa

mel = fastspeech2_inference(phone_ids, durations=durations)

其他还有什么需要修改的吗？另外问一下为什么fastspeech2_inference中的参数不需要传入pitch和energy，是用默认的1吗？

Answered by jerryuhoo

Jan 6, 2022

提交了pr #1278

View full answer

yt605155624 · 2022-01-05T02:31:54Z

yt605155624
Jan 5, 2022
Collaborator

应该没有其他需要修改的地方了
pitch和energy，是用默认的

2 replies

jerryuhoo Jan 5, 2022
Author

发现了一个问题，link_wav.py需要dump_finetune和dump文件夹中的train，dev，test文件名一致，但是gen_gta_mel.py中分到train，dev，test中的方式是读取duration.txt再选取9800，100，100（以标贝为例）到这三个文件夹里，而preprocess中是根据

wav_files = sorted(list((wav_dir / speaker).rglob("*.wav")))

这行代码分类的，我不知道对于baker单一的数据集分类到的顺序是不是一样，但是如果要多个speakers的数据集的话好像就不太行，因为preprocess中是从每个speaker文件夹中各拿5个作为dev和test的（以aishell3为例），如果是读取duration.txt的话好像就不能这么做吧，所以有什么方法能做到让dump_finetune和dump里的文件一样呢？

rogerle Mar 13, 2023

请问在用tts streaming的时候我想调整ptich，该在哪里设置

yt605155624 · 2022-01-05T13:00:44Z

yt605155624
Jan 5, 2022
Collaborator

排序结果应该是固定的，可以实验一下
gen gta mel 目前的写法好像确实不太能直接用在多说话人上preorocess 的时候有 train/eval/test_wav_files 这个 list 是否可以想办法在 gen gta mel 里面也加入这些 list 同时记录 spk 信息，存 wav 的时候读下这个数据结构呢

0 replies

jerryuhoo · 2022-01-06T01:18:32Z

jerryuhoo
Jan 6, 2022
Author

在preprocess中保存train/dev/test_wav_file到文件，每行是 “文件名 train/dev/test”，然后在gen_gta_mel中读duration.txt每行的时候再去根据文件名去查表以获得文件是train/dev/test，然后再分别保存到对应的路径，这样做您觉得怎么样？

0 replies

yt605155624 · 2022-01-06T01:33:14Z

yt605155624
Jan 6, 2022
Collaborator

也可以不在preprocess的时候保存，而在gen gta mel 的时候再读一遍原始数据集再排序，按照 preprocess的过程再划分一下，这样就不用再跑一遍 preprocess了，但是这样实现就要求原始数据的参与了…
也可以直接读一遍 dump 文件夹？这样就不需要 preprocess 也不需要原始数据
具体怎么存，是存成一行一行的还是存成一个字典，key 是 train/dev/test 看您个人喜好了

0 replies

jerryuhoo · 2022-01-06T06:30:28Z

jerryuhoo
Jan 6, 2022
Author

提交了pr #1278

0 replies

jerryuhoo · 2022-01-10T02:34:22Z

jerryuhoo
Jan 10, 2022
Author

samples.zip

fastspeech2的finetune结果虽然减少了一点金属音，但是感觉整体的音频质量不如finetune之前的，说话有点不流畅，我上传了一些对比的样本，请问有什么解决的方法呢？

0 replies

yt605155624 · 2022-01-10T02:48:09Z

yt605155624
Jan 10, 2022
Collaborator

声学模型如果是 speedyspeech 的话，应用 speedyspeech finetune 吧

5 replies

jerryuhoo Jan 10, 2022
Author

目前的finetune我用的是fastspeech的模型，speedyspeech之后也会做finetune但是现在还没开始做。

jerryuhoo Jan 10, 2022
Author

finetune有没有用到原来训练好的hifigan模型？还是说直接是从头开始训练，把ground truth梅尔谱换成预测的梅尔谱？我看好像是重新开始训练的？

yt605155624 Jan 10, 2022
Collaborator

需要用到训练好的 hifigan 模型继续训练哈，需要把原来训练好的模型放到对应的路径，并且在 records.jsonl 里面指定路径，类似于训练中断后重新开始训练，其实就是读取 records.jsonl 的最后一行的模型的参数继续训练，目前代码里面没有写拷贝模型的操作... 之所以 yaml 里面步数变为原来的二倍，是因为我们现在的 trainer 会读取预训练模型的步数，如果和 yaml 一样就不会再训练了（现在其实是训练 1001W 步再 finetune 100W 步，而不是直接 finetune 200W 步），另外你可以再调节调节 finetune.yaml 里面的超参数

jerryuhoo Jan 10, 2022
Author

哦哦，那确实可能是我从头开始训练导致的这个问题，因为我看finetune.sh里是直接新建了一个finetune文件夹，我之前的模型是在default文件夹里。那可能需要再在finetune.sh里加一下拷贝这个操作

yt605155624 Jan 10, 2022
Collaborator

拷贝这个操作也不知道咋写方便，因为也不知道用户想从哪个模型开始 finetune, mb melgan finetune.yaml 的超参数我没调节过，这里当时只是想给用户一个生成 gta mel 的示例而已

yt605155624 · 2022-01-10T03:15:06Z

yt605155624
Jan 10, 2022
Collaborator

fintune 时用的声学模型应该和合成时用的一致，因为 speedyspeech 和 fastspeech2 合成的 mel 的分布可能不同

1 reply

jerryuhoo Jan 10, 2022
Author

嗯，这个我知道，我现在就是在做fastspeech2的finetune，训练hifigan用的声学模型和推理的都是fastspeech2。

jerryuhoo · 2022-01-11T06:26:06Z

jerryuhoo
Jan 11, 2022
Author

PaddleSpeech/examples/csmsc/voc5/finetune.sh

Line 24 in fb238d8

python3 link_wav.py \

今天用的时候发现这行报错了，找不到link_wav.py。是不是不能这么写，export PATH好像找不到py文件，只能找到可执行文件，需要改为这样吗？

python3 ${MAIN_ROOT}/utils/link_wav.py \

1 reply

yt605155624 Jan 11, 2022
Collaborator

PaddleSpeech/examples/csmsc/voc5/finetune.sh

Line 24 in fb238d8

python3 link_wav.py \

今天用的时候发现这行报错了，找不到link_wav.py。是不是不能这么写，export PATH好像找不到py文件，只能找到可执行文件，需要改为这样吗？
python3 ${MAIN_ROOT}/utils/link_wav.py \

确实应该改一下

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vocoder finetune支持多人的问题？ #1268

{{title}}

Replies: 9 comments 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

vocoder finetune支持多人的问题？ #1268

jerryuhoo Jan 5, 2022

Replies: 9 comments · 9 replies

yt605155624 Jan 5, 2022 Collaborator

jerryuhoo Jan 5, 2022 Author

rogerle Mar 13, 2023

yt605155624 Jan 5, 2022 Collaborator

jerryuhoo Jan 6, 2022 Author

yt605155624 Jan 6, 2022 Collaborator

jerryuhoo Jan 6, 2022 Author

jerryuhoo Jan 10, 2022 Author

yt605155624 Jan 10, 2022 Collaborator

jerryuhoo Jan 10, 2022 Author

jerryuhoo Jan 10, 2022 Author

yt605155624 Jan 10, 2022 Collaborator

jerryuhoo Jan 10, 2022 Author

yt605155624 Jan 10, 2022 Collaborator

yt605155624 Jan 10, 2022 Collaborator

jerryuhoo Jan 10, 2022 Author

jerryuhoo Jan 11, 2022 Author

yt605155624 Jan 11, 2022 Collaborator

jerryuhoo
Jan 5, 2022

Replies: 9 comments 9 replies

yt605155624
Jan 5, 2022
Collaborator

jerryuhoo Jan 5, 2022
Author

yt605155624
Jan 5, 2022
Collaborator

jerryuhoo
Jan 6, 2022
Author

yt605155624
Jan 6, 2022
Collaborator

jerryuhoo
Jan 6, 2022
Author

jerryuhoo
Jan 10, 2022
Author

yt605155624
Jan 10, 2022
Collaborator

jerryuhoo Jan 10, 2022
Author

jerryuhoo Jan 10, 2022
Author

yt605155624 Jan 10, 2022
Collaborator

jerryuhoo Jan 10, 2022
Author

yt605155624 Jan 10, 2022
Collaborator

yt605155624
Jan 10, 2022
Collaborator

jerryuhoo Jan 10, 2022
Author

jerryuhoo
Jan 11, 2022
Author

yt605155624 Jan 11, 2022
Collaborator