question: How can I quantize BERT to FP16 ? #104

hexiaoyupku · 2019-10-31T03:11:25Z

I have only P100 and V100 which dosen't support INT8. So what should I do to quantize BERT to FP16 ?
Thanks in advance!

ofirzaf · 2019-11-05T10:26:04Z

Our framework for the time being only simulates quantized inference, meaning, that the quantized GEMM operations are still done using FP32 arithmetic with integer values.
We don't have quantization aware training for FP16 implemented.
What you can do is train BERT regularly in FP32 and than convert the model to FP16 by calling half() method on the model.

hexiaoyupku added the question Further information is requested label Oct 31, 2019

peteriz assigned ofirzaf Nov 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: How can I quantize BERT to FP16 ? #104

question: How can I quantize BERT to FP16 ? #104

hexiaoyupku commented Oct 31, 2019

ofirzaf commented Nov 5, 2019

question: How can I quantize BERT to FP16 ? #104

question: How can I quantize BERT to FP16 ? #104

Comments

hexiaoyupku commented Oct 31, 2019

ofirzaf commented Nov 5, 2019