-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop upstream sync 241210 #2783
Conversation
Adds a new boolean `xla_dump_hlo_unoptimized_snapshots` to the `DebugOptions` protobuf. When enabled, we'll dump an `HloUnoptimizedSnapshot` for each execution of an HLO module. This option only affects GPU targets for now. PiperOrigin-RevId: 703009410
PiperOrigin-RevId: 703011765
PiperOrigin-RevId: 703016890
PiperOrigin-RevId: 703016903
PiperOrigin-RevId: 703019838
PiperOrigin-RevId: 703025710
PiperOrigin-RevId: 703026703
PiperOrigin-RevId: 703028747
…en HostOffloadLegalize moves copies out of host-memory-only offloading. PiperOrigin-RevId: 703033475
PiperOrigin-RevId: 703033719
PiperOrigin-RevId: 703035110
… log Imported from GitHub PR openxla/xla#19913 Error started occurring from this commit for exp openxla/xla@6e9eefe (originally introduced here openxla/xla@9b19353#diff-61ab646c9c3b8b0fc5ed1e9a62f535e9df5843adddd071250343f3bec48eacb6) and from this one openxla/xla@53d5338 for log. Trying to compile following MLIR code: ``` HloModule module ENTRY main { p0 = bf16[4] parameter(0) ROOT exp = bf16[4] exp(p0) } ``` would result in: ``` UNKNOWN: <unknown>:0: error: loc(callsite("wrapped_exponential" at "wrapped_exponential")): failed to legalize operation 'math.exp' <unknown>:0: note: loc("wrapped_exponential"): called from <unknown>:0: note: loc(callsite("wrapped_exponential" at "wrapped_exponential")): see current operation: %7 = "math.exp"(%6) <{fastmath = #arith.fastmath<afn>}> : (bf16) -> bf16 ``` Copybara import of the project: -- 616c10b5308cb827c593a89455fea4b772d6e870 by Milica Makevic <Milica.Makevic@amd.com>: Do not use fast approximation for exp and log for ROCm -- 3fa4914f90458a0285deb8801c5689421f945fe4 by Milica Makevic <Milica.Makevic@amd.com>: Add unit test for log and exp lowering on ROCm Merging this change closes tensorflow#19913 PiperOrigin-RevId: 703035402
…utions. This extends the custom algorithm to cover 2D cases. Benchmarks show about 50 times better performance than the generic algorithm, detailed results: name old cpu/op new cpu/op delta BM_Conv2DStrided/process_time 35.2ms ± 9% 34.3ms ± 6% ~ (p=0.690 n=5+5) BM_Conv2DTransposedStrided/process_time 8.25s ± 8% 0.03s ± 3% -99.62% (p=0.008 n=5+5) name old time/op new time/op delta BM_Conv2DStrided/process_time 3.06ms ±19% 2.88ms ± 6% ~ (p=0.421 n=5+5) BM_Conv2DTransposedStrided/process_time 415ms ±12% 9ms ± 4% -97.93% (p=0.008 n=5+5) Planned improvements of this algorithm: - support feature_group_size > 1 (grouped convolution), - parallel packing of the patches (second algorithm step), - support the case with multiple input channels and output channels at the same time, - explore input kernel rotation possibilities & perf impact, PiperOrigin-RevId: 703036601
Updates LLVM usage to match [71ac1eb50955](llvm/llvm-project@71ac1eb50955) PiperOrigin-RevId: 703048823
PiperOrigin-RevId: 703050199
…n in GEMM Rewriter Imported from GitHub PR openxla/xla#20153 Removes collectives from the set of ops that can be exchanged with dequantization in the GEMM rewriter. Copybara import of the project: -- e2efa84143fe30c5c6b25132831a62707c2a8f75 by Philipp Hack <phack@nvidia.com>: Removes collectives from the set of ops exchanged with dequantization in the GEMM rewriter. Merging this change closes tensorflow#20153 PiperOrigin-RevId: 703051850
PiperOrigin-RevId: 703052110
…peration with oneDNN primitives Imported from GitHub PR openxla/xla#18616 This PR refactors the code that fuses add operation to matmul / convolution primitives. It removes usage of macros and separate templatized handlers for matmul and convolution cases. Copybara import of the project: -- 68bcdf81a47fb0f753d837c034931094c5cd8017 by Akhil Goel <akhil.goel@intel.com>: Refactor Add Handler -- 462890bb75f2fcea3fdc5966bfa7a2b8f94b255a by Akhil Goel <akhil.goel@intel.com>: Address review comments Merging this change closes tensorflow#18616 PiperOrigin-RevId: 703054496
PiperOrigin-RevId: 703063087
Add the dtypes for which a CUB kernel is unavailable to the log output. PiperOrigin-RevId: 703067645
PiperOrigin-RevId: 703074301
PiperOrigin-RevId: 703076882
…dent ops. PiperOrigin-RevId: 703082625
`xla::Compiler` manages instances of `Compiler` that are registered statically. `StreamExecutorGpuClient` used to get its `Compiler` instance during static initialization which might fail if the `Compiler` instance gets registered later. As a fix we will get the needed `Compiler` instance during every compilation call. PiperOrigin-RevId: 703087406
PiperOrigin-RevId: 703091758
PiperOrigin-RevId: 703092539
Updates LLVM usage to match [dd7a3d4d798e](llvm/llvm-project@dd7a3d4d798e) PiperOrigin-RevId: 703100529
is this weekly-sync no problem now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you followed the insturctions https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/SYNC_UPSTREAM.md
to do this check?
○ When all merge conflict resolved, do a grep -rn "<<<<<<" to make sure no diff symbols exist in the source
it seems there are still 9 failed tests on gpu-pycpp, could you able to reproduce it?
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's SKIP subtests, not FIX. Could you create an issue to track these skipped ones?
Retest cpu-pycpp please |
Rerun cpu-pycpp please |
Rebuild cpu-pycpp please |
2277278
to
b1e817d
Compare
…flow-upstream into develop-upstream-sync-241210
//tensorflow/compiler/mlir/quantization/tensorflow/python:quantize_model_test FAILED in 49 out of 50 in 302.4s locally:
|
Weekly sync from 12.10 with the upstream.