Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conv fp16 kernel in xnnpack EP #22301

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Add conv fp16 kernel in xnnpack EP #22301

wants to merge 16 commits into from

Conversation

mszhanyi
Copy link
Contributor

@mszhanyi mszhanyi commented Oct 3, 2024

Description

Add FP16 kernels of Conv and ConvTranspose

Motivation and Context

@mszhanyi mszhanyi marked this pull request as draft October 3, 2024 13:13
@mszhanyi mszhanyi marked this pull request as ready for review October 3, 2024 14:28
onnxruntime/core/providers/xnnpack/detail/utils.cc Outdated Show resolved Hide resolved
onnxruntime/core/providers/xnnpack/detail/utils.cc Outdated Show resolved Hide resolved
onnxruntime/core/providers/xnnpack/detail/utils.cc Outdated Show resolved Hide resolved
onnxruntime/core/providers/xnnpack/nn/conv.cc Outdated Show resolved Hide resolved
onnxruntime/core/providers/xnnpack/nn/conv.cc Outdated Show resolved Hide resolved
onnxruntime/core/providers/xnnpack/nn/conv.cc Outdated Show resolved Hide resolved
Comment on lines 159 to 160
const float output_min = -65504.0;
const float output_max = 65504.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these values coming from? I would have expected we use something based on foutput_min/foutput_max so any clip parameters (from a fusion of two nodes) are honoured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's calculated by FP16 format, 1 sign bit, 5 exponent bits and 11 mantissa bits.

Copy link
Contributor Author

@mszhanyi mszhanyi Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked that tensorflow is using 65504 directly

      # Note 65504. is the max float16 value.
      if scores.dtype is dtypes.float16:
        scores -= 65504. * math_ops.cast(padding_mask, dtype=scores.dtype)

https://github.com/tensorflow/tensorflow/blob/47dc9d146e99f5180906d8bd1b0c0291fa947d23/tensorflow/python/keras/layers/dense_attention.py#L126

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it looks that we can't get the FP16 max/min value by std::numeric_limits like u8s8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to const auto output_min = clip_min_max ? onnxruntime::math::floatToHalf(clip_min_max->first) : -65504.0;

onnxruntime/core/providers/xnnpack/nn/conv_transpose.cc Outdated Show resolved Hide resolved
onnxruntime/test/providers/checkers.cc Outdated Show resolved Hide resolved
@mszhanyi mszhanyi marked this pull request as draft October 7, 2024 03:50
@mszhanyi mszhanyi marked this pull request as ready for review October 7, 2024 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants