ORT GPU build #5622

ChSonnabend · 2024-09-17T11:13:07Z

This is a draft PR to discuss possible changes to onnxruntime.sh for GPU builds on the EPN's and potentially CUDA (to be tested)

ChSonnabend · 2024-09-17T11:13:28Z

Ping @davidrohr

davidrohr

Du solltest ein paar Umgebungsvariable, die wir in o2.sh nutzen, auch mitaufnehmen und analog behandeln: https://github.com/alisw/alidist/blob/1916f6d88d42959097998d9481b517dc1c1ea84d/o2.sh#L191C9-L191C30

ALIBUILD_O2_FORCE_GPU
DISABLE_GPU
ALIBUILD_ENABLE_CUDA
ALIBUILD_ENABLE_HIP
ALIBUILD_O2_OVERRIDE_HIP_ARCHS
ALIBUILD_O2_OVERRIDE_CUDA_ARCHS

Wenn ENABLE_CUDA oder ENABLE_HIP gesetzt ist, sollte der build fehlschlagen, wenn er CUDA/HIP nicht bauen kann.

davidrohr · 2024-09-17T11:14:55Z

onnxruntime.sh

+                      "
+    elif command -v nvcc >/dev/null 2>&1; then
+      CUDA_VERSION=$(nvcc --version | grep "release" | awk '{print $NF}' | cut -d. -f1)
+      if [[ "$CUDA_VERSION" == "V11" ]]; then


glaube CUDA 11 kannst du weglassen, und nur >=12 annehmen

davidrohr · 2024-09-17T11:15:37Z

onnxruntime.sh

+ORT_BUILD_FLAGS=""
+case $ARCHITECTURE in
+  osx_*)
+    if [[ $ARCHITECTURE == *_x86-64 ]]; then


Solche printouts würde ich weglassen, das ist ja hauptsächlich für debugging

Ja, aber ich nehme an das es auch einen macOS build gibt der die Mac GPU anspricht. Da muss ich nochmal ein bisschen rumsuchen, dann könnte man den if-Block nämlich nehmen um da die build flags rein zu packen. Aber ja, die print-outs nehm ich am Ende natürlich noch raus

davidrohr · 2024-09-17T11:21:35Z

onnxruntime.sh

+    fi
+  ;;
+  *)
+    if command -v rocminfo >/dev/null 2>&1; then


rocm version check fehlt

Es ist nicht klar, ob rocminfo im Pfad liegt. Du solltest zumindest /opt/rocm/bin/rocminfo testen. Und dann ist migraphx ein separates ROCm paket. Sprich, wenn rocminfo vorhanden ist, heist das noch nicht, das migraphx vorhanden ist. Du solltest explicit auf migraphx testen.

Good point, das check ich nochmal

davidrohr · 2024-09-17T11:22:30Z

onnxruntime.sh

+        ORT_BUILD_FLAGS=" -Donnxruntime_USE_CUDA=ON                                                     \
+                          -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_ROOT                                            \
+                          -Donnxruntime_USE_CUDA_NHWC_OPS=ON                                            \
+                          -Donnxruntime_CUDA_USE_TENSORRT=ON                                            \


Wenn du tensorrt nutzt, musst du dann prüfen, ob das explicit installiert ist? Oder ist das immer beim CUDA SDK dabei?

Scheint nicht automatisch mitzukommen (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)... Ok da bau ich auch noch einen Check mit ein

davidrohr · 2024-09-17T11:22:33Z

onnxruntime.sh

+                          -Donnxruntime_USE_CUDA_NHWC_OPS=ON                                            \
+                          -Donnxruntime_CUDA_USE_TENSORRT=ON                                            \
+                          "
+      elif [[ "$CUDA_VERSION" == "V12" ]]; then


Was ist wenn ROCm und CUDA beides vorhanden ist? Können wir dann nicht beides bauen?

Ne, die kann man nicht parallel bauen, es geht immer nur eins von beiden: https://github.com/microsoft/onnxruntime/blob/afd642a194b39138ad891e7bb2c8bca26d37b785/cmake/CMakeLists.txt#L288-L290

ktf · 2024-09-17T14:48:41Z

Gneau...

…adding env-variables for GPU enabling during code execution. For al9_gpu container and simultaneous CUDA & ROCm build, this requires ChSonnabend/onnxruntime@6ffc40c

…e build with CUDA and ROCm fails due to a ROCm internal check for THRUST and CUB libraries, which are not in sync (file: /opt/rocm/include/thrust/system/cuda/config.h)

davidrohr · 2024-10-04T12:40:16Z

onnxruntime.sh

+    export ORT_MIGRAPHX_BUILD=0
+fi
+### TensorRT
+if [ "$ORT_CUDA_BUILD" -eq 1 ] && [ $(find /opt/rocm* -name "libnvinfer*" -print -quit | wc -l 2>&1) -eq 1 ]; then


why do you search for tensort in /opt/rocm?

davidrohr · 2024-10-04T12:41:28Z

onnxruntime.sh

-      -DCMAKE_CXX_FLAGS="$CXXFLAGS -Wno-unknown-warning -Wno-unknown-warning-option -Wno-error=unused-but-set-variable -Wno-error=deprecated" \
-      -DCMAKE_C_FLAGS="$CFLAGS -Wno-unknown-warning -Wno-unknown-warning-option -Wno-error=unused-but-set-variable -Wno-error=deprecated"
+# Check ROCm build conditions
+if { [ "$ALIBUILD_O2_FORCE_GPU" -ne 0 ] || [ "$ALIBUILD_ENABLE_HIP" -ne 0 ] || command -v rocminfo >/dev/null 2>&1; } && \


is it guaranteed that rocminfo is in the path? Perhaps, try rocminfo or /opt/rocm/bin/rocminfo?
Also, perhaps it needs HIP, not rocminfo? So perhaps check for /opt/rocm/bin/hipcc?

onnxruntime.sh

davidrohr · 2024-10-04T12:43:43Z

onnxruntime.sh

+      -Donnxruntime_CUDA_HOME=/usr/local/cuda                                                               \
+      -DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++                                                       \
+      -D__HIP_PLATFORM_AMD__=1                                                                              \
+      -DCMAKE_HIP_ARCHITECTURES=gfx906,gfx908                                                               \


I don't understand this. You first set CMAKE_HIP_ARCHITECTURES, and then you possibly override it in the next line? Why don't you expand ALIBUILD_O2_OVERRIDE_HIP_ARCHS to the defaults if empty? ${...:-default}?

davidrohr · 2024-11-04T10:37:30Z

onnxruntime.sh

+# Check CUDA build conditions
+if { [ "$ALIBUILD_O2_FORCE_GPU" -ne 0 ] || [ "$ALIBUILD_ENABLE_CUDA" -ne 0 ] || command -v nvcc >/dev/null 2>&1; } && \
+   { [ -z "$DISABLE_GPU" ] || [ "$DISABLE_GPU" -eq 0 ]; }; then
+    export ORT_CUDA_BUILD=1


I would also set a default for ALIBUILD_O2_OVERRIDE_CUDA_ARCHS to sm_86 or sm_89 architecture for now

davidrohr

if [ "$ALIBUILD_O2_FORCE_GPU" -eq 1 ]
will not work if the variable is not defined.
You can either do [ "0$FOO" == "01" ] or use the bash [[ syntax. Please try with all variables undefined.

… to passed to o2.sh

ChSonnabend · 2024-12-02T21:52:49Z

macOS failure seem unrelated to this PR. It can be merged from my side if there are no objections (@ktf ).

onnxruntime.sh

…TH at runtime)

ChSonnabend · 2024-12-20T08:14:05Z

PR is ready from my side. GPU build should be disabled on EPN's due to missing ROCm packages and ALMA linux major version mismatch (8 instead of 9).

singiamtel · 2025-01-08T17:10:59Z

@pzhristov just FYI this would leave our onnx version outside the tf2onnx compatibility range https://github.com/onnx/tensorflow-onnx?tab=readme-ov-file#tf2onnx---convert-tensorflow-keras-tensorflowjs-and-tflite-models-to-onnx

ChSonnabend · 2025-01-08T19:27:32Z

@pzhristov just FYI this would leave our onnx version outside the tf2onnx compatibility range https://github.com/onnx/tensorflow-onnx?tab=readme-ov-file#tf2onnx---convert-tensorflow-keras-tensorflowjs-and-tflite-models-to-onnx

Why actually? If you refer to the opset number:I think the ONNX opset number does not necessarily correlate with the ONNX version (I might be wrong here). Does it actually break something for the converter?

ChSonnabend · 2025-01-17T08:07:44Z

The macOS-arm build seems unrelated. Can this PR be merged? @singiamtel @ktf

ktf · 2025-01-17T08:22:23Z

o2.sh

@@ -139,6 +139,12 @@ valid_defaults:
 #!/bin/sh
 export ROOTSYS=$ROOT_ROOT

+source $ONNXRUNTIME_ROOT/etc/ort-init.sh


This should be made optional via $ONNXRUNTIME_REVISION.

ktf · 2025-01-17T08:24:14Z

onnxruntime.sh

+  [[ -d /opt/rocm/include/rccl ]] && \
+  [[ -z "$ORT_ROCM_BUILD" ]] ) && \
+  ([[ -z "$ALMA_LINUX_MAJOR_VERSION" ]] || [[ "$ALMA_LINUX_MAJOR_VERSION" -eq 9 ]]); then
+  export ORT_ROCM_BUILD=1


Please fix the linter comments.

ktf · 2025-01-17T08:27:49Z

It seems to me that a number of old comments / linter requests have not been taken into account. Please have a look.

If you are ready to integrate, can you change the name to the PR to "ORT GPU build" as well?

I might have other comments as well.

onnxruntime.sh

ChSonnabend · 2025-01-22T14:08:24Z

MacOS failure seems unrelated. Can we merge? (@ktf )

This reverts commit 54466f4.

ChSonnabend added 5 commits August 15, 2024 17:23

OnnxRuntime build on AMD GPU's

c80e36c

Merge branch 'alisw:master' into onnxruntime-gpu

4d42b1f

Modifying recipe for build on Nvidia GPU's (still needs testing)

23c3e5f

Updating ONNX build flags

78fcf07

Merge branch 'alisw:master' into onnxruntime-gpu

6e3689b

davidrohr reviewed Sep 17, 2024

View reviewed changes

ChSonnabend added 5 commits September 27, 2024 10:52

Merge branch 'master' into onnxruntime-gpu

0fa8a06

Updating version to 1.19.0

73b54ce

Adding automatic checks for migraphx, changing build cmake flags and …

5e42e46

…adding env-variables for GPU enabling during code execution. For al9_gpu container and simultaneous CUDA & ROCm build, this requires ChSonnabend/onnxruntime@6ffc40c

Merge branch 'master' into onnxruntime-gpu

9e47dfd

This builds ORT with the GPU flags. Note: In the al9_gpu container th…

e90a9e5

…e build with CUDA and ROCm fails due to a ROCm internal check for THRUST and CUB libraries, which are not in sync (file: /opt/rocm/include/thrust/system/cuda/config.h)

davidrohr requested changes Oct 4, 2024

View reviewed changes

Adding comments and reshuffeling for better readibility

d7b089d

davidrohr reviewed Nov 4, 2024

View reviewed changes

ChSonnabend added 4 commits November 18, 2024 11:25

Adding checks for Cuda and ROCm libraries

d45d194

Updating to a recent version of ONNX

000afbb

Merge branch 'alisw:master' into onnxruntime-gpu

9613081

Changing to -eq 1

f28a341

ChSonnabend marked this pull request as ready for review November 22, 2024 21:43

ChSonnabend requested a review from a team as a code owner November 22, 2024 21:43

davidrohr requested changes Nov 22, 2024

View reviewed changes

ChSonnabend added 4 commits November 26, 2024 10:18

Changing to double-brace syntax

4813226

Changing to version 1.20 since 1.19 has issues for GPU execution

2713a61

Adding check for Alma9 (if system is AlmaLinux). exports's still need…

62ece4e

… to passed to o2.sh

Adding compile flags for ONNXRuntime

ea219bf

ChSonnabend requested a review from a team as a code owner November 28, 2024 13:29

ChSonnabend added 2 commits November 29, 2024 13:58

Adding AlmaLinux 9 as general &&-check

ae214b5

Adding ORT_ROCM_BUILD check for CUDA build

6f3d636

Force disabling for alma linux distribution

5e0f296

ChSonnabend dismissed ktf’s stale review via 5e0f296 December 1, 2024 18:07

ktf reviewed Dec 19, 2024

View reviewed changes

onnxruntime.sh Show resolved Hide resolved

ktf reviewed Dec 19, 2024

View reviewed changes

onnxruntime.sh Show resolved Hide resolved

Adding ORT variables to be available at build time (and LD_LIBRARY_PA…

e01351a

…TH at runtime)

singiamtel mentioned this pull request Jan 7, 2025

Bump pip requirements on newer pythons #5727

Merged

Merge branch 'alisw:master' into onnxruntime-gpu

6bf0ad8

ktf reviewed Jan 17, 2025

View reviewed changes

ChSonnabend added 3 commits January 17, 2025 10:36

Fixing linter, adjusting for comments

d093398

Fixing ...; then

6624f3f

Changing to -n

0cfeaa0

ChSonnabend changed the title ~~Draft of ORT GPU build~~ ORT GPU build Jan 17, 2025

Adding hipblaslt check

48e25ed

ktf reviewed Jan 21, 2025

View reviewed changes

onnxruntime.sh Outdated Show resolved Hide resolved

ktf reviewed Jan 21, 2025

View reviewed changes

onnxruntime.sh Outdated Show resolved Hide resolved

ktf added 2 commits January 22, 2025 00:18

Update onnxruntime.sh

bee78c8

Update onnxruntime.sh

1493b80

ktf approved these changes Jan 21, 2025

View reviewed changes

ktf merged commit 54466f4 into alisw:master Jan 22, 2025
11 of 12 checks passed

ktf added a commit that referenced this pull request Jan 22, 2025

Revert "ORT GPU build (#5622)"

1e5b6ed

This reverts commit 54466f4.

ktf mentioned this pull request Jan 22, 2025

Revert "ORT GPU build" #5741

Merged

ktf added a commit that referenced this pull request Jan 22, 2025

Revert "ORT GPU build (#5622)" (#5741)

480615e

This reverts commit 54466f4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT GPU build #5622

ORT GPU build #5622

ChSonnabend commented Sep 17, 2024

ChSonnabend commented Sep 17, 2024

davidrohr left a comment

davidrohr Sep 17, 2024

davidrohr Sep 17, 2024

ChSonnabend Sep 17, 2024

davidrohr Sep 17, 2024

ChSonnabend Sep 17, 2024

davidrohr Sep 17, 2024

ChSonnabend Sep 17, 2024

davidrohr Sep 17, 2024

ChSonnabend Sep 17, 2024 •

edited

Loading

ktf commented Sep 17, 2024

davidrohr Oct 4, 2024

ChSonnabend Oct 4, 2024

davidrohr Oct 4, 2024

davidrohr Oct 4, 2024

davidrohr Nov 4, 2024

davidrohr left a comment

ChSonnabend commented Dec 2, 2024 •

edited

Loading

ChSonnabend commented Dec 20, 2024

singiamtel commented Jan 8, 2025

ChSonnabend commented Jan 8, 2025

ChSonnabend commented Jan 17, 2025

ktf Jan 17, 2025

ktf Jan 17, 2025

ktf commented Jan 17, 2025

ChSonnabend commented Jan 22, 2025 •

edited

Loading

ORT GPU build #5622

ORT GPU build #5622

Conversation

ChSonnabend commented Sep 17, 2024

ChSonnabend commented Sep 17, 2024

davidrohr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChSonnabend Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

ktf commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidrohr left a comment

Choose a reason for hiding this comment

ChSonnabend commented Dec 2, 2024 • edited Loading

ChSonnabend commented Dec 20, 2024

singiamtel commented Jan 8, 2025

ChSonnabend commented Jan 8, 2025

ChSonnabend commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktf commented Jan 17, 2025

ChSonnabend commented Jan 22, 2025 • edited Loading

ChSonnabend Sep 17, 2024 •

edited

Loading

ChSonnabend commented Dec 2, 2024 •

edited

Loading

ChSonnabend commented Jan 22, 2025 •

edited

Loading