We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
微调后合并并量化int4模型,直接对新模型进行推理,推理速度明显慢于官方int4模型。 但是如果是把微调的pytorch_model.bin替换官方的pytorch_model.bin文件后,再推理,速度就和官方的是差不多的。 这是哪块儿的问题呢?是得需要修再修改新模型的其他文件吗?
The text was updated successfully, but these errors were encountered:
使用官方int4中的quantization.py替换一下,推理性能也会提升
Sorry, something went wrong.
No branches or pull requests
微调后合并并量化int4模型,直接对新模型进行推理,推理速度明显慢于官方int4模型。
但是如果是把微调的pytorch_model.bin替换官方的pytorch_model.bin文件后,再推理,速度就和官方的是差不多的。
这是哪块儿的问题呢?是得需要修再修改新模型的其他文件吗?
The text was updated successfully, but these errors were encountered: