Numerous hipblaslt related fixes #2772

pemeliya · 2024-11-27T21:22:59Z

fixing multi-threading/multi-GPU issue in gpublaslt matmul_op kernel
added better cache gpublas_lt_matmul thunk
added #define LEGACY_HIPBLAS_DIRECT to hipblaslt wrapper to fix potential problems with new hipblaslt versions

i-chaochen

Thanks for the work and comments!

i-chaochen · 2024-11-28T00:27:56Z

tensorflow/core/kernels/matmul_util.cc

+  static std::deque< BlasLtMatmulPlanCache > meta(8);
+  absl::MutexLock lock(&m);
+  size_t dev_id = stream->parent()->device_ordinal();
+  if (dev_id >= meta.size()) meta.resize(dev_id + 1);


If it's multi-nodes and each node at least has 8 gpus, I think we should meta.resize(dev_id + 8); ?

third_party/xla/xla/service/gpu/runtime/gpublas_lt_matmul_thunk.cc

jayfurmanek · 2024-12-03T03:29:18Z

We need this for 2.18 as well.

pemeliya requested review from i-chaochen, jayfurmanek and draganmladjenovic November 27, 2024 21:22

i-chaochen reviewed Nov 28, 2024

View reviewed changes

numerous hipblaslt related fixes

f229ec4

pemeliya force-pushed the r2.17-rocm-enhanced-hipblaslt-fixes branch from 7cd6ea8 to f229ec4 Compare November 29, 2024 20:36

i-chaochen approved these changes Dec 2, 2024

View reviewed changes

removed LEGACY define

394b32e

pemeliya merged commit e49be7f into r2.17-rocm-enhanced Dec 4, 2024
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerous hipblaslt related fixes #2772

Numerous hipblaslt related fixes #2772

pemeliya commented Nov 27, 2024

i-chaochen left a comment

i-chaochen Nov 28, 2024 •

edited

Loading

jayfurmanek commented Dec 3, 2024

Numerous hipblaslt related fixes #2772

Numerous hipblaslt related fixes #2772

Conversation

pemeliya commented Nov 27, 2024

i-chaochen left a comment

Choose a reason for hiding this comment

i-chaochen Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

jayfurmanek commented Dec 3, 2024

i-chaochen Nov 28, 2024 •

edited

Loading