Replies: 1 comment 5 replies
-
Try to set |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
numactl -N 0 ib_read_lat -d mlx5_0 --use_cuda=0 -a -c RC
bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec]
2 1000 3.25 9.45 3.31 3.32
4 1000 3.22 8.11 3.32 3.32
8 1000 3.25 3.43 3.32 3.32
16 1000 3.27 3.41 3.34 3.33
32 1000 3.24 3.42 3.32 3.32
64 1000 3.26 3.47 3.33 3.33
128 1000 3.27 3.47 3.35 3.34
256 1000 3.30 3.49 3.37 3.38
512 1000 3.30 3.65 3.39 3.39
1024 1000 3.35 3.76 3.43 3.43
2048 1000 3.41 3.67 3.49 3.49
4096 1000 3.51 3.72 3.61 3.61
8192 1000 3.64 6.33 3.72 3.73
16384 1000 3.98 4.26 4.05 4.05
32768 1000 4.31 4.53 4.42 4.41
65536 1000 5.09 5.24 5.16 5.15
131072 1000 7.28 7.59 7.35 7.34
262144 1000 10.18 10.34 10.27 10.26
524288 1000 16.01 16.19 16.09 16.09
1048576 1000 27.67 33.20 27.75 27.76
2097152 1000 51.00 51.17 51.07 51.07
4194304 1000 97.63 98.23 97.71 97.71
mpirun -x UCX_TLS=rc,cuda -x UCX_ZCOPY_THRESH=auto -x UCX_RNDV_THRESH=auto -x UCX_MEMTYPE_CACHE=n numactl -N 0 osu_latency -d CUDA D D
Size Latency (us)
1 8.75
2 8.91
4 9.11
8 9.28
16 9.50
32 9.67
64 9.83
128 9.84
256 9.94
512 9.66
1024 9.46
2048 9.59
4096 9.98
8192 10.54
16384 17.85
32768 32.04
65536 7.15
131072 8.51
262144 11.72
524288 17.04
1048576 27.72
2097152 48.90
4194304 91.22
The latency ucx of small packages is much worse, while the delay of large packages is about the same.
May I ask if there are any parameters that can optimize the small packet latency of osu_latency?
Use hpcx-v2.17.1 ucx 1.16
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions