[Performance] fp16 support and performance #22242
Labels
performance
issues related to performance regressions
platform:mobile
issues related to ONNX Runtime mobile; typically submitted using template
Describe the issue
FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16
To reproduce
convert onnx model from fp32 to fp16 using onnxmltools
onnxruntime c++ liblary inference(convert inputs and outputs data format from fp32 to fp16)
Urgency
No response
Platform
Android
OS Version
34
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
C++
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: