You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NHWC layout recommended by ACL is used in reproducer. taskset is used to force single thread mode and avoid threading issues.
The 1st command (f16 convolution) gives 0.267766 ms, the 2nd one (f32 convolution) gives 0.273554 ms on Ampere.
I'd expect better f16 convolution performance.
If reproducer command is called with DNNL_VERBOSE=1 then we observe 2 convolutions in f16 case:
It's not clear the purpose of the 2nd convolution in f16 case (moreover, it's f32 convolution). Probably, it's ACL integration in oneDNN issue rather than ACL issue.
The text was updated successfully, but these errors were encountered:
@morgolock I double checked the issue description and I think, I can't provide standalone ACL reproducer.
Perhaps this issue needs to be reviewed from oneDNN integration point of view, since oneDNN calls 2 convolution primitives in fp16 case and only 1 primitive in fp32 case.
So, probably, it's not ACL issue, but ACL integration into oneDNN issue.
Should we ask Milos to take a look at this?
ACL 24.07
ACL build command:
benchdnn build command:
Reproducer commands:
NHWC
layout recommended by ACL is used in reproducer.taskset
is used to force single thread mode and avoid threading issues.The 1st command (f16 convolution) gives
0.267766
ms, the 2nd one (f32 convolution) gives0.273554
ms on Ampere.I'd expect better f16 convolution performance.
If reproducer command is called with
DNNL_VERBOSE=1
then we observe 2 convolutions in f16 case:and 1 convolution in fp32 case:
It's not clear the purpose of the 2nd convolution in f16 case (moreover, it's f32 convolution). Probably, it's ACL integration in oneDNN issue rather than ACL issue.
The text was updated successfully, but these errors were encountered: