Question about memory scaling during training #16

rees-c · 2024-02-16T06:44:42Z

rees-c
Feb 16, 2024

Hi,

Thanks for building these models. I noticed that the training scripts for the MP pre-trained models use small batch sizes of 16. What was the reasoning for this choice?

My application requires training on graphs with hundreds to a few thousand nodes, and I was hoping that MACE's lack of explicit triplet angle computation (as in DimeNet or GemNet) would offer more favorable memory scaling. Any insights would be greatly appreciated.

Thanks,
Rees

Answered by ilyes319

Oct 9, 2024

Hi @rees-c,
Sorry for the long delay in reply, the MACE github would be a more suitable place for your question.
The batch size has both an effect on the memory consumption but also on the training dynamics.
MACE can fit during training about 1000 nodes on a single GPU, A100. However we rarely go over 64 of batch size per GPU because we see degradation of accuracy past that.

View full answer

ilyes319 · 2024-10-09T09:21:19Z

ilyes319
Oct 9, 2024
Maintainer

Hi @rees-c,
Sorry for the long delay in reply, the MACE github would be a more suitable place for your question.
The batch size has both an effect on the memory consumption but also on the training dynamics.
MACE can fit during training about 1000 nodes on a single GPU, A100. However we rarely go over 64 of batch size per GPU because we see degradation of accuracy past that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about memory scaling during training #16

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question about memory scaling during training #16

rees-c Feb 16, 2024

Replies: 1 comment

ilyes319 Oct 9, 2024 Maintainer

rees-c
Feb 16, 2024

ilyes319
Oct 9, 2024
Maintainer