Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

Closed

Conversation

hsharsha
Copy link

No description provided.

Rahul Batra and others added 30 commits July 11, 2024 20:06
Imported from GitHub PR openxla#15311

@xla-rotation
Copybara import of the project:

--
2c4cee2 by Chao Chen <cchen104@amd.com>:

unified memory for rocm

Merging this change closes openxla#15311

COPYBARA_INTEGRATE_REVIEW=openxla#15311 from ROCm:ci_rocm_unify_mem 2c4cee2
PiperOrigin-RevId: 657168704
…-copy

Let the other stream wait for the main stream before issuing memcpy d2h
Main changes include:
* Added support for fp8 matmul with output data type to be fp8 and bf16.
* Added buffer comparators for fp8e4m3fnuz and fp8e5m2fnuz
Replace "Navi" with corresponding public product names
…on unit tests

Imported from GitHub PR openxla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes openxla#16938

COPYBARA_INTEGRATE_REVIEW=openxla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676615012
Add NANOO FP8 support for collaborative communication unit tests
[ROCm] Include clang-19 and clang-20 headers
hsharsha and others added 16 commits October 2, 2024 17:45
* reset blas stream used by gemm_algorithm_picker

* small refactoring

* fixing clang format

* fixing clang format

* fixing clang format

---------

Co-authored-by: Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>
…ble-triton

Add multigpu script and disable triton tests
[ROCm] Added include of hipblas.h in hipblaslt_wrapper.h
* PR openxla#14605: [ROCm] Switch on Triton feature for ROCm.

Imported from GitHub PR openxla#14605

Last in series of commits to switch on Triton in XLA for ROCm.

This is new version of:
openxla#13003

Changes in third_party/triton/temporary/amd_pr7.patch are already merged on:
triton-lang/triton#4238
Copybara import of the project:

--
c2ce7e0 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Switch on Triton feature for ROCm.

--
563b303 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fixed an issue with test cases from ir_emitter_triton_test.cc

--
a4d2ad8 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fixed an issue with gpu_compiler_test.cc

--
a1b9260 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Applied comments from code review.

--
c694a95 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fixed failed tests because of openxla@19c11ba

--
7359619 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fixed compilation issue with latest rebase.

--
82f58ce by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Skip SplitLHSInputOutputIsFused test in ir_emitter_triton_test.cc untill issue is fixed.

--
57e776b by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Triton related changes merged thus removed amd_pr7.patch

--
0d09d0e by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Applied comments from code review.

--
7b11147 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Applied comments from code review.

--
9e7e0c7 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Modified TestNoAutotuner test case.

Merging this change closes openxla#14605

COPYBARA_INTEGRATE_REVIEW=openxla#14605 from ROCm:rocm_triton_backend_8 9e7e0c7
PiperOrigin-RevId: 652449567

* Fixed test issues.
[ROCm] Fixed linker issues related to fp8 buffer_comparator functions
Passing amdgpu targets to crosstool wrapper which calls hipcc can
restrict the kernels generated to specific set of supported amdgpu
architectures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.