You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this great works! I was trying to follow the training procedure, but it seems the training stucks at the beginning, it keeps showing the following for more than ten minutes and does not proceed any more:
`Training for 25000 kimg...
tick 0 kimg 0.0 time 1m 20s sec/tick 5.5 sec/kimg 1372.65 maintenance 74.8 cpumem 5.39 gpumem 21.01 reserved 22.00 augment 0.000`
By the way, I was trying to resume from "afhqcats512-128.pkl". Could any body give me some advice about how to move on?
The text was updated successfully, but these errors were encountered:
Hi, I'm not sure about the problem based on the provided information. However, if you are training with multiple GPUs, you might want to try setting the environment variable NCCL_P2P_DISABLE=1.
Thanks for this great works! I was trying to follow the training procedure, but it seems the training stucks at the beginning, it keeps showing the following for more than ten minutes and does not proceed any more:
`Training for 25000 kimg...
tick 0 kimg 0.0 time 1m 20s sec/tick 5.5 sec/kimg 1372.65 maintenance 74.8 cpumem 5.39 gpumem 21.01 reserved 22.00 augment 0.000`
By the way, I was trying to resume from "afhqcats512-128.pkl". Could any body give me some advice about how to move on?
The text was updated successfully, but these errors were encountered: