Rework memcpy transformer to support WebGPU EP being added #22329

skottmckay · 2024-10-06T06:12:29Z

Description

Rework the memcpy transformer to simplify and support an additional GPU based EP.

Treat nodes as being GPU based or not and change provider/non-provider naming to be GPU/CPU.
- easier to understand the code and supports additional GPU based EPs
Detect incompatible GPU EPs
- the implementation doesn't handle copies between incompatible GPUs (e.g. nVidia <-> AMD)
- this isn't an expected use case. the GPU EPs (CUDA/TensorRT vs ROCm/MIGraphX vs WebGPU) have fairly comprehensive supported operators so you wouldn't enable multiple at the same time

Miscellaneous:

Improve const-ness and removing most const_cast's
Remove unnecessary 'onnxruntime::' prefix
Update 2 tests to assign a control flow node
- test with control flow node being assigned to CPU and CUDA EPs

Easier to review with whitespace diffs hidden.

Motivation and Context

Fix CI failures when WebGPU and CUDA EPs are enabled in the same build.

EPs are either GPU or non-GPU. Insert device copies when trasitioning between the two. Disallow incompatible GPU EPs (no reason to support) Improve const-ness.

Fix tests where node in main graph wasn't assigned to an EP

skottmckay · 2024-10-06T06:16:25Z

onnxruntime/test/framework/memcpy_transformer_test.cc

@@ -16,16 +16,16 @@ using namespace ONNX_NAMESPACE;
 namespace onnxruntime {
 namespace test {

-typedef std::vector<onnxruntime::NodeArg*> ArgMap;
+typedef std::vector<NodeArg*> ArgMap;


Most diffs here are from removing the unnecessary onnxruntime:: prefix.

2 tests are updated to assign the 'If' node in the main graph to an EP. The existing test is unchanged apart from the assignment. Test is moved to a lambda with the 'If' node being assigned to the CPU and CUDA EPs.

lines 195 and 350 is where the existing test was moved into a lambda

lines 258 and 400 is where the assignment of the 'If' node occurs

skottmckay · 2024-10-06T09:45:16Z

onnxruntime/core/optimizer/transformer_memcpy.cc

+  ORT_ENFORCE(!incompatible_gpu_eps, "Mixing CUDA/TensorRT, ROCm/MIGraphX, and WebGPU is not supported.");
+
+  for (auto& provider : provider_types_) {
+    if (utils::ProviderIsCpuBased(provider) == false) {


Key aspect when reviewing is that the transformer only runs for GPU based EPs.

fs-eire · 2024-10-07T05:47:58Z

include/onnxruntime/core/common/const_pointer_container.h

-    bool operator!=(const ConstIterator& other) const noexcept { return current_ != other.current_; }
+    bool operator==(const ConstIterator& rhs) const noexcept { return current_ == rhs.current_; }
+    bool operator!=(const ConstIterator& rhs) const noexcept { return current_ != rhs.current_; }
+    size_t operator-(const ConstIterator& rhs) const noexcept { return current_ - rhs.current_; }


should it be ptrdiff_t?

skottmckay added 7 commits October 5, 2024 20:08

Simplify.

1c59fe6

EPs are either GPU or non-GPU. Insert device copies when trasitioning between the two. Disallow incompatible GPU EPs (no reason to support) Improve const-ness.

Cleanups

dab9327

Fix tests where node in main graph wasn't assigned to an EP

Ignore unassigned nodes

7593b3c

Fix unnecessary copy

52464bb

Fix linux build warning

4cec815

Fix condition

ed17383

Cleanup

626d191

skottmckay requested a review from fs-eire October 6, 2024 06:12

skottmckay commented Oct 6, 2024

View reviewed changes

skottmckay marked this pull request as draft October 7, 2024 03:03

fs-eire reviewed Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework memcpy transformer to support WebGPU EP being added #22329

Rework memcpy transformer to support WebGPU EP being added #22329

skottmckay commented Oct 6, 2024

skottmckay Oct 6, 2024

skottmckay Oct 6, 2024

fs-eire Oct 7, 2024

Rework memcpy transformer to support WebGPU EP being added #22329

Are you sure you want to change the base?

Rework memcpy transformer to support WebGPU EP being added #22329

Conversation

skottmckay commented Oct 6, 2024

Description

Motivation and Context

skottmckay Oct 6, 2024

Choose a reason for hiding this comment

skottmckay Oct 6, 2024

Choose a reason for hiding this comment

fs-eire Oct 7, 2024

Choose a reason for hiding this comment