Gloo in Pytorch for GPU tensor collective communication #312

YukeWang96 · 2021-10-14T02:13:20Z

For Gloo in Pytorch distributed, as shown in this document https://pytorch.org/docs/stable/distributed.html, will the following code get performance benefits of using CUDA-aware MPI? (e.g., GPU-to-GPU transferring via PCIe while bypassing CPU)

group = dist.new_group([0, 1], backend="gloo")
dist.all_reduce(gpu_tensor_a, op=dist.ReduceOp.SUM, group=group)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gloo in Pytorch for GPU tensor collective communication #312

Gloo in Pytorch for GPU tensor collective communication #312

YukeWang96 commented Oct 14, 2021

Gloo in Pytorch for GPU tensor collective communication #312

Gloo in Pytorch for GPU tensor collective communication #312

Comments

YukeWang96 commented Oct 14, 2021