Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调后推理性能问题 #44

Open
daydayup-zyn opened this issue Oct 13, 2023 · 1 comment
Open

微调后推理性能问题 #44

daydayup-zyn opened this issue Oct 13, 2023 · 1 comment

Comments

@daydayup-zyn
Copy link

微调后合并并量化int4模型,直接对新模型进行推理,推理速度明显慢于官方int4模型。
但是如果是把微调的pytorch_model.bin替换官方的pytorch_model.bin文件后,再推理,速度就和官方的是差不多的。
image
这是哪块儿的问题呢?是得需要修再修改新模型的其他文件吗?

@daydayup-zyn
Copy link
Author

使用官方int4中的quantization.py替换一下,推理性能也会提升
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant