Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework memcpy transformer to support WebGPU EP being added #22329

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

skottmckay
Copy link
Contributor

Description

Rework the memcpy transformer to simplify and support an additional GPU based EP.

  • Treat nodes as being GPU based or not and change provider/non-provider naming to be GPU/CPU.
    • easier to understand the code and supports additional GPU based EPs
  • Detect incompatible GPU EPs
    • the implementation doesn't handle copies between incompatible GPUs (e.g. nVidia <-> AMD)
    • this isn't an expected use case. the GPU EPs (CUDA/TensorRT vs ROCm/MIGraphX vs WebGPU) have fairly comprehensive supported operators so you wouldn't enable multiple at the same time

Miscellaneous:

  • Improve const-ness and removing most const_cast's
  • Remove unnecessary 'onnxruntime::' prefix
  • Update 2 tests to assign a control flow node
    • test with control flow node being assigned to CPU and CUDA EPs

Easier to review with whitespace diffs hidden.

image

Motivation and Context

Fix CI failures when WebGPU and CUDA EPs are enabled in the same build.

EPs are either GPU or non-GPU. Insert device copies when trasitioning between the two.

Disallow incompatible GPU EPs (no reason to support)

Improve const-ness.
Fix tests where node in main graph wasn't assigned to an EP
@@ -16,16 +16,16 @@ using namespace ONNX_NAMESPACE;
namespace onnxruntime {
namespace test {

typedef std::vector<onnxruntime::NodeArg*> ArgMap;
typedef std::vector<NodeArg*> ArgMap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most diffs here are from removing the unnecessary onnxruntime:: prefix.

2 tests are updated to assign the 'If' node in the main graph to an EP. The existing test is unchanged apart from the assignment. Test is moved to a lambda with the 'If' node being assigned to the CPU and CUDA EPs.

  • lines 195 and 350 is where the existing test was moved into a lambda
  • lines 258 and 400 is where the assignment of the 'If' node occurs

ORT_ENFORCE(!incompatible_gpu_eps, "Mixing CUDA/TensorRT, ROCm/MIGraphX, and WebGPU is not supported.");

for (auto& provider : provider_types_) {
if (utils::ProviderIsCpuBased(provider) == false) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key aspect when reviewing is that the transformer only runs for GPU based EPs.

@skottmckay skottmckay marked this pull request as draft October 7, 2024 03:03
bool operator!=(const ConstIterator& other) const noexcept { return current_ != other.current_; }
bool operator==(const ConstIterator& rhs) const noexcept { return current_ == rhs.current_; }
bool operator!=(const ConstIterator& rhs) const noexcept { return current_ != rhs.current_; }
size_t operator-(const ConstIterator& rhs) const noexcept { return current_ - rhs.current_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be ptrdiff_t?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants