You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the training script, I ran into an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
./run.sh
This happens every time in the Evaluation step of the train.py script - after the 'convert squad examples to features' step completes successfully and right after 'Evaluating: 0%' is printed.
I have made sure torch can pick up the cuda info:
print(torch.cuda.is_available())
True
The text was updated successfully, but these errors were encountered:
This is a very low-level issue, and unfortunately "NCCL Error 1: unhandled cuda error" means that even CUDA does not know what it is. I could only suggest updating drivers or seeing if there is a more detailed error log, but even then this would be a CUDA or hardware issue.
When I run the training script, I ran into an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
./run.sh
This happens every time in the Evaluation step of the train.py script - after the 'convert squad examples to features' step completes successfully and right after 'Evaluating: 0%' is printed.
I have made sure torch can pick up the cuda info:
The text was updated successfully, but these errors were encountered: