You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.
Our framework for the time being only simulates quantized inference, meaning, that the quantized GEMM operations are still done using FP32 arithmetic with integer values.
We don't have quantization aware training for FP16 implemented.
What you can do is train BERT regularly in FP32 and than convert the model to FP16 by calling half() method on the model.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have only P100 and V100 which dosen't support INT8. So what should I do to quantize BERT to FP16 ?
Thanks in advance!
The text was updated successfully, but these errors were encountered: