Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 241210 #2783

Merged
merged 1,426 commits into from
Jan 7, 2025
Merged

Conversation

cj401-amd
Copy link

Weekly sync from 12.10 with the upstream.

Aliia Khasanova and others added 30 commits December 5, 2024 00:47
Adds a new boolean `xla_dump_hlo_unoptimized_snapshots` to the `DebugOptions` protobuf. When enabled, we'll dump an `HloUnoptimizedSnapshot` for each execution of an HLO module. This option only affects GPU targets for now.

PiperOrigin-RevId: 703009410
PiperOrigin-RevId: 703011765
PiperOrigin-RevId: 703016890
PiperOrigin-RevId: 703025710
PiperOrigin-RevId: 703028747
PiperOrigin-RevId: 703031030
…en HostOffloadLegalize moves copies out of host-memory-only offloading.

PiperOrigin-RevId: 703033475
PiperOrigin-RevId: 703033719
… log

Imported from GitHub PR openxla/xla#19913

Error started occurring from this commit for exp openxla/xla@6e9eefe (originally introduced here openxla/xla@9b19353#diff-61ab646c9c3b8b0fc5ed1e9a62f535e9df5843adddd071250343f3bec48eacb6) and from this one openxla/xla@53d5338 for log.

Trying to compile following MLIR code:
```
HloModule module

ENTRY main {
      p0 = bf16[4] parameter(0)
      ROOT exp = bf16[4] exp(p0)
}
```
would result in:
```
UNKNOWN: <unknown>:0: error: loc(callsite("wrapped_exponential" at "wrapped_exponential")): failed to legalize operation 'math.exp'
<unknown>:0: note: loc("wrapped_exponential"): called from
<unknown>:0: note: loc(callsite("wrapped_exponential" at "wrapped_exponential")): see current operation: %7 = "math.exp"(%6) <{fastmath = #arith.fastmath<afn>}> : (bf16) -> bf16
```

Copybara import of the project:

--
616c10b5308cb827c593a89455fea4b772d6e870 by Milica Makevic <Milica.Makevic@amd.com>:

Do not use fast approximation for exp and log for ROCm

--
3fa4914f90458a0285deb8801c5689421f945fe4 by Milica Makevic <Milica.Makevic@amd.com>:

Add unit test for log and exp lowering on ROCm

Merging this change closes tensorflow#19913

PiperOrigin-RevId: 703035402
…utions.

This extends the custom algorithm to cover 2D cases. Benchmarks show about 50 times better performance than the generic algorithm, detailed results:

name                                      old cpu/op   new cpu/op   delta
BM_Conv2DStrided/process_time             35.2ms ± 9%  34.3ms ± 6%     ~     (p=0.690 n=5+5)
BM_Conv2DTransposedStrided/process_time    8.25s ± 8%   0.03s ± 3%  -99.62%  (p=0.008 n=5+5)

name                                      old time/op  new time/op  delta
BM_Conv2DStrided/process_time             3.06ms ±19%  2.88ms ± 6%     ~     (p=0.421 n=5+5)
BM_Conv2DTransposedStrided/process_time    415ms ±12%     9ms ± 4%  -97.93%  (p=0.008 n=5+5)

Planned improvements of this algorithm:
- support feature_group_size > 1 (grouped convolution),
- parallel packing of the patches (second algorithm step),
- support the case with multiple input channels and output channels at the same time,
- explore input kernel rotation possibilities & perf impact,

PiperOrigin-RevId: 703036601
Updates LLVM usage to match
[71ac1eb50955](llvm/llvm-project@71ac1eb50955)

PiperOrigin-RevId: 703048823
PiperOrigin-RevId: 703050199
…n in GEMM Rewriter

Imported from GitHub PR openxla/xla#20153

Removes collectives from the set of ops that can be exchanged with dequantization in the GEMM rewriter.
Copybara import of the project:

--
e2efa84143fe30c5c6b25132831a62707c2a8f75 by Philipp Hack <phack@nvidia.com>:

Removes collectives from the set of ops exchanged with dequantization in the GEMM rewriter.

Merging this change closes tensorflow#20153

PiperOrigin-RevId: 703051850
PiperOrigin-RevId: 703052110
…peration with oneDNN primitives

Imported from GitHub PR openxla/xla#18616

This PR refactors the code that fuses add operation to matmul / convolution primitives. It removes usage of macros and separate templatized handlers for matmul and convolution cases.
Copybara import of the project:

--
68bcdf81a47fb0f753d837c034931094c5cd8017 by Akhil Goel <akhil.goel@intel.com>:

Refactor Add Handler

--
462890bb75f2fcea3fdc5966bfa7a2b8f94b255a by Akhil Goel <akhil.goel@intel.com>:

Address review comments

Merging this change closes tensorflow#18616

PiperOrigin-RevId: 703054496
PiperOrigin-RevId: 703063087
Add the dtypes for which a CUB kernel is unavailable to the log output.

PiperOrigin-RevId: 703067645
PiperOrigin-RevId: 703074301
`xla::Compiler` manages instances of `Compiler` that are registered statically.

`StreamExecutorGpuClient` used to get its `Compiler` instance during static initialization which might fail if the `Compiler` instance gets registered later.

As a fix we will get the needed `Compiler` instance during every compilation call.

PiperOrigin-RevId: 703087406
Updates LLVM usage to match
[dd7a3d4d798e](llvm/llvm-project@dd7a3d4d798e)

PiperOrigin-RevId: 703100529
PiperOrigin-RevId: 703109195
@i-chaochen
Copy link

is this weekly-sync no problem now?

Copy link

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you followed the insturctions https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/SYNC_UPSTREAM.md
to do this check?

○ When all merge conflict resolved, do a grep -rn "<<<<<<" to make sure no diff symbols exist in the source

@i-chaochen
Copy link

it seems there are still 9 failed tests on gpu-pycpp, could you able to reproduce it?

14:47:24  //tensorflow/compiler/tests:matrix_diag_ops_test_gpu                     FAILED in 374.3s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/compiler/tests/matrix_diag_ops_test_gpu/test.log
14:47:24  //tensorflow/examples/adding_an_op:cuda_op_test                          FAILED in 5.5s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/examples/adding_an_op/cuda_op_test/test.log
14:47:24  //tensorflow/examples/custom_ops_doc/multiplex_2:multiplex_2_test_gpu    FAILED in 5.9s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/examples/custom_ops_doc/multiplex_2/multiplex_2_test_gpu/test.log
14:47:24  //tensorflow/python/distribute/integration_test:saved_model_test_gpu     FAILED in 6.1s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/python/distribute/integration_test/saved_model_test_gpu/test.log
14:47:24  //tensorflow/python/ops/numpy_ops:np_interop_test_gpu                    FAILED in 5.9s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/python/ops/numpy_ops/np_interop_test_gpu/test.log
14:47:24  //tensorflow/python/ops/numpy_ops/integration_test:np_config_test_gpu    FAILED in 6.1s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/python/ops/numpy_ops/integration_test/np_config_test_gpu/test.log
14:47:24  //tensorflow/python/ops/numpy_ops/tests:np_einsum_test                   FAILED in 5.3s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/python/ops/numpy_ops/tests/np_einsum_test/test.log
14:47:24  //tensorflow/python/profiler:profiler_client_test_gpu                    FAILED in 3.0s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/python/profiler/profiler_client_test_gpu/test.log
14:47:24  //tensorflow/tools/pip_package:import_api_packages_test                  FAILED in 5.2s
14:47:24    /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/tensorflow/tools/pip_package/import_api_packages_test/test.log
14:47:24  
14:47:24  Executed 739 out of 739 tests: 730 tests pass and 9 fail locally.

@cj401-amd
Copy link
Author

INFO: Analyzed target //tensorflow/compiler/tests:matrix_diag_ops_test_gpu (658 packages loaded, 49514 targets configured).
INFO: Found 1 test target...
Target //tensorflow/compiler/tests:matrix_diag_ops_test_gpu up-to-date:
  bazel-bin/tensorflow/compiler/tests/matrix_diag_ops_test_gpu
INFO: Elapsed time: 4621.159s, Critical Path: 3560.52s
INFO: 20345 processes: 434 internal, 19911 local.
INFO: Build completed successfully, 20345 total actions
//tensorflow/compiler/tests:matrix_diag_ops_test_gpu                     PASSED in 628.6s```

Copy link

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's SKIP subtests, not FIX. Could you create an issue to track these skipped ones?

@pemeliya
Copy link

Retest cpu-pycpp please

@pemeliya
Copy link

Rerun cpu-pycpp please

@pemeliya
Copy link

Rebuild cpu-pycpp please

@mmakevic-amd mmakevic-amd force-pushed the develop-upstream-sync-241210 branch from 2277278 to b1e817d Compare December 30, 2024 04:24
…flow-upstream into develop-upstream-sync-241210
@cj401-amd
Copy link
Author

cj401-amd commented Jan 3, 2025

//tensorflow/compiler/mlir/quantization/tensorflow/python:quantize_model_test FAILED in 49 out of 50 in 302.4s
//tensorflow/python/distribute/failure_handling:gce_failure_handler_test FAILED in 25 out of 32 in 301.5s
//tensorflow/python/data/experimental/kernel_tests/service:data_service_ops_test FAILED in 26 out of 32 in 73.0s
//tensorflow/python/data/kernel_tests:sample_from_datasets_test FAILED in 13 out of 24 in 99.3s
//tensorflow/python/data/kernel_tests:interleave_test FAILED in 9 out of 24 in 222.2s
//tensorflow/python/kernel_tests/nn_ops:embedding_ops_test_cpu FAILED in 13 out of 20 in 301.0s
//tensorflow/python/kernel_tests/math_ops:matmul_op_test_cpu FAILED in 19 out of 20 in 301.9s
//tensorflow/python/kernel_tests/math_ops:batch_matmul_op_test_cpu FAILED in 14 out of 20 in 301.0s
//tensorflow/python/kernel_tests/linalg:self_adjoint_eig_op_test_cpu FAILED in 14 out of 20 in 300.9s
//tensorflow/python/kernel_tests/linalg:qr_op_test_cpu FAILED in 20 out of 20 in 302.0s
//tensorflow/python/kernel_tests/linalg:normalize_op_test_cpu FAILED in 12 out of 20 in 301.3s
//tensorflow/python/kernel_tests/linalg:linalg_grad_test_cpu FAILED in 20 out of 20 in 301.4s
...

locally:

bazel --bazelrc=tensorflow/tools/tf_sig_build_dockerfiles/devel.usertools/cpu.bazelrc test \
 --config=sigbuild_local_cache \
 --verbose_failures \
 --disk_cache=/tf/cache \
 --config=pycpp \
 --config=rocm \
 --action_env=TF_PYTHON_VERSION=3.10 \
 --test_env=HIP_VISIBLE_DEVICES="" \
 --test_timeout=600 \
 --test_tag_filters=-no_cuda_on_cpu_tap,-no-gpu,-optimize.mlir.test,-requires-gpu-nvidia,-tpu,-v1only,-oss_serial,-no_windows,-no_oss \
 --local_test_jobs=256 ```

`
//tensorflow/tools/graph_transforms:transform_graph_test        (cached) PASSED in 12.8s
//tensorflow/tools/graph_transforms:transform_utils_test        (cached) PASSED in 12.9s
//tensorflow/tools/graph_transforms:transforms_test             (cached) PASSED in 16.1s
//tensorflow/tools/pip_package:import_api_packages_test         (cached) PASSED in 5.9s
//tensorflow/tools/pip_package:prebuilt_wheel_import_api_packages_test (cached) PASSED in 0.2s
//tensorflow/tools/proto_splitter:util_test                     (cached) PASSED in 12.3s
//tensorflow/tools/proto_splitter/python:test_util_test         (cached) PASSED in 10.3s
//tensorflow/tools/proto_text:gen_proto_text_functions_lib_test (cached) PASSED in 2.6s
//tensorflow/tools/tensorflow_builder/compat_checker:compat_checker_test (cached) PASSED in 0.4s

Executed 0 out of 2126 tests: 2126 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see wh\
ich ones these are.`

@i-chaochen
Copy link

Hi @cj401-ai please create a task to record these skipped unit tests

02686ea
256194f

@cj401-amd cj401-amd merged commit 03daf31 into develop-upstream Jan 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.