You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for your contributions, which have been of great help to my research at present. In the process of using this framework for training, I observed a sudden increase in memory usage during validation after training the first epoch (in my case, about 10g of memory was used during training the first epoch, and 17g of memory was used during validation after the first epoch, which did not change thereafter). I speculate that 10g of memory was used for training and 7g of memory was used for validation, but the memory used for training was not released during the validation inference process, resulting in the above phenomenon. Since I haven't thoroughly researched the source code, I'm not sure if this is a necessary operation for the framework, but this issue may lead to meaningless memory usage (for example, a 24GB graphics card can only be treated as a 17GB graphics card under this training/validation framework). And I found that the training script doesn't seem to support multi graphics card separation training, which can to some extent solve the problem of insufficient graphics card memory.
The text was updated successfully, but these errors were encountered:
Thank you very much for your contributions, which have been of great help to my research at present. In the process of using this framework for training, I observed a sudden increase in memory usage during validation after training the first epoch (in my case, about 10g of memory was used during training the first epoch, and 17g of memory was used during validation after the first epoch, which did not change thereafter). I speculate that 10g of memory was used for training and 7g of memory was used for validation, but the memory used for training was not released during the validation inference process, resulting in the above phenomenon. Since I haven't thoroughly researched the source code, I'm not sure if this is a necessary operation for the framework, but this issue may lead to meaningless memory usage (for example, a 24GB graphics card can only be treated as a 17GB graphics card under this training/validation framework). And I found that the training script doesn't seem to support multi graphics card separation training, which can to some extent solve the problem of insufficient graphics card memory.
The text was updated successfully, but these errors were encountered: