Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fnlp/bart-large-chinese中的mask字符? #10

Open
ScottishFold007 opened this issue Dec 8, 2022 · 3 comments
Open

fnlp/bart-large-chinese中的mask字符? #10

ScottishFold007 opened this issue Dec 8, 2022 · 3 comments

Comments

@ScottishFold007
Copy link

楼主你好!我在处理中文预训练数据时,发现原有的mask是'',对于fnlp/bart-base-chinese和fnlp/bart-large-chinese来讲,这个mask是不是应该为'[MASK]'?

@beyondguo
Copy link
Owner

你好,我是用的mask token就是[MASK]。请问是哪里的代码造成了误解吗?

@ScottishFold007
Copy link
Author

你好,我是用的mask token就是[MASK]。请问是哪里的代码造成了误解吗?

在词组抽取的类里、jieba的默认是,这个我改了,问题不大。
对了,你训练这个base中文模型是什么配置,耗时多久?我现在跑9千万数据,8*v100 32g,速度非常慢,当然文本长度这里,我source的最大长度改为60,target长度最大410。

@beyondguo
Copy link
Owner

我训练1000万的,在8*A100 80g上,耗时大概3天

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants