You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@BoyuanJiang, hey I think we do support this, but it's not very well documented. In particular, I think if we press Ctrl + C once, then the model should checkpoint? Let me know if that doesn't work
yes it will hold for 30 second and then all rank will be killed. But I am not sure which line of code will save the latest state to checkpoint in this duration(30 second)?
🚀 Feature Request
Can I save latest checkpoint when crashed or press Ctrl+C?
Motivation
[Optional] Implementation
Additional context
The text was updated successfully, but these errors were encountered: