关于 ERNIE-SAT 的一些疑问 #2291
-
看了一下架构图,感觉 ernie-sat 的方式有点像 bert?这里的输入是 mask 掉的音频+文本。但是去网上搜好像没看到相关论文,所以冒昧的问一下大概什么时候会完善相关内容?感觉挺有意思的。同时感谢相关工作! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
看到 github 上国外的一个拼凑出的多模态大模型,https://github.com/neonbjb/tortoise-tts |
Beta Was this translation helpful? Give feedback.
-
ernie sat 预计九月初会完成论文初稿,上个版本的论文参考百度美研院: A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing |
Beta Was this translation helpful? Give feedback.
-
@sixyang ERNIE-SAT 示例已完成,欢迎试用 |
Beta Was this translation helpful? Give feedback.
-
@sixyang ERNIE-SAT 的论文已经挂到 arxiv 上了 ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech |
Beta Was this translation helpful? Give feedback.
ernie sat 预计九月初会完成论文初稿,上个版本的论文参考百度美研院: A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
九月底 PaddleSpeech 会发布相关直播课,example/ernie_sat 里面只有预测代码,整理自 https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-sat
example/(vctk)(aishell3)(aishell3_vctk)/ernie_sat 已经提供训练代码,预测代码在开发中(旧版本用 htk 对齐,新版本用 MFA 对齐),关注 #2287