Skip to content

Question about memory scaling during training #16

Answered by ilyes319
rees-c asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @rees-c,
Sorry for the long delay in reply, the MACE github would be a more suitable place for your question.
The batch size has both an effect on the memory consumption but also on the training dynamics.
MACE can fit during training about 1000 nodes on a single GPU, A100. However we rarely go over 64 of batch size per GPU because we see degradation of accuracy past that.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ilyes319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #3 on October 09, 2024 09:21.