You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NEDeconvolutionLayer run() with f16 tensors takes more time than NEDeconvolutionLayer run() with f32 tensors.
On Ampere f32 version takes ~66 milliseconds, f16 version ~80 milliseconds.
@morgolock thank you for the patch, it works for me as well.
Although, my diff between f32 and f16 is not so high as yours - I have 65-67 ms on f32 and 60-62 ms on f16.
What machine was used to get results you shared above?
NEDeconvolutionLayer
run()
with f16 tensors takes more time than NEDeconvolutionLayerrun()
with f32 tensors.On Ampere f32 version takes ~66 milliseconds, f16 version ~80 milliseconds.
ACL build command:
Reproducer build command
Reproducer run commands:
The 1st command uses f32 tensors, the 2nd one - f16 tensors.
Reproducer:
The text was updated successfully, but these errors were encountered: