Ways to reduce memory use #96

peastman · 2022-05-31T19:20:23Z

I'm trying to train equivariant transformer models on a GPU with 12 GB of memory. I can train small to medium sized models, but if I make it too large (for example, 6 layers with embedding dimension 96), CUDA runs out of device memory. Is there anything I can do to reduce the memory requirements? I already tried reducing the batch size but it didn't help.

giadefa · 2022-05-31T19:45:28Z

and of course batch size, what is your current batch size for training? On Tue, May 31, 2022 at 9:44 PM Gianni De Fabritiis ***@***.***> wrote:

…

Cutoff 5 and 5 interaction layers seem to be optimal, so see if that fits. Maybe Raimondas optimizations could help? On Tue, May 31, 2022 at 9:20 PM Peter Eastman ***@***.***> wrote: > I'm trying to train equivariant transformer models on a GPU with 12 GB of > memory. I can train small to medium sized models, but if I make it too > large (for example, 6 layers with embedding dimension 96), CUDA runs out of > device memory. Is there anything I can do to reduce the memory > requirements? I already tried reducing the batch size but it didn't help. > > — > Reply to this email directly, view it on GitHub > <#96>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB3KUOU2FGJ3FP3X5I4RVR3VMZRADANCNFSM5XOSVTKA> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

PhilippThoelke · 2022-05-31T19:55:09Z

16 bit floats?

peastman · 2022-05-31T20:04:14Z

I'm using cutoff 10. I've found that 5 is far too short. It can't reproduce energies for molecules larger than about 40 atoms, and it has no chance at all on intermolecular interactions.

Batch size seems to have very little effect on memory use. With 5 layers I can use batch size 100. Add a sixth layer and it runs out of memory even if I reduce it to 1.

giadefa · 2022-05-31T20:06:55Z

with a cutoff of 10A what are you using for max_num_neighbors?

…

On Tue, May 31, 2022 at 10:04 PM Peter Eastman ***@***.***> wrote: I'm using cutoff 10. I've found that 5 is far too short. It can't reproduce energies for molecules larger than about 40 atoms, and it has no chance at all on intermolecular interactions. Batch size seems to have very little effect on memory use. With 5 layers I can use batch size 100. Add a sixth layer and it runs out of memory even if I reduce it to 1. — Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOWJXJWV4PIPWABM5GLVMZWERANCNFSM5XOSVTKA> . You are receiving this because you commented.Message ID: ***@***.***>

PhilippThoelke · 2022-05-31T20:07:24Z

Batch size seems to have very little effect on memory use. With 5 layers I can use batch size 100. Add a sixth layer and it runs out of memory even if I reduce it to 1.

That seems odd. Are you sure you are changing batch_size and inference_batch_size?

peastman · 2022-05-31T20:41:20Z

with a cutoff of 10A what are you using for max_num_neighbors?

80

Are you sure you are changing batch_size and inference_batch_size?

I was changing only batch_size, not inference_batch_size. If I reduce both of them to 32 then I can get it to run. Thanks!

giadefa · 2022-05-31T20:48:59Z

Are you using the latest code that errors out if it gets above 80? With a 10A cutoff it seems possible.

…

On Tue, May 31, 2022 at 10:41 PM Peter Eastman ***@***.***> wrote: with a cutoff of 10A what are you using for max_num_neighbors? 80 Are you sure you are changing batch_size and inference_batch_size? I was changing only batch_size, not inference_batch_size. If I reduce both of them to 32 then I can get it to run. Thanks! — Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOQSPXV4QWDG3T62QMDVMZ2PXANCNFSM5XOSVTKA> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2022-05-31T20:58:32Z

There's no problem with higher values than 80 (except of course running out of memory). 100 also works.

giadefa · 2022-05-31T21:01:00Z

With higher of course, but there can be more than 80 atoms in 10A.

…

On Tue, May 31, 2022 at 10:58 PM Peter Eastman ***@***.***> wrote: There's no problem with higher values than 80 (except of course running out of memory). 100 also works. — Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOURFPQ5FBDVCNGSFE3VMZ4QFANCNFSM5XOSVTKA> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2022-05-31T21:22:46Z

It depends on the particular samples. Does the value of max_num_neighbors apply only to training? Or does it set a limit on any molecule you can ever evaluate with the trained model?

PhilippThoelke · 2022-05-31T21:51:52Z

It applies always, not only during training. The argument determines the maximum number of neighbors that are collected in the neighbor list algorithm. You can overwrite it when you load a model checkpoint though, to set it to a higher number for inference for example.

peastman · 2022-05-31T22:29:19Z

That's good to know. If I want to override it, would I just add the argument max_num_neighbors=100 in the call to load_from_checkpoint()?

PhilippThoelke · 2022-05-31T23:29:23Z

I think that should work but better make sure it actually overwrites it. I'd recommend using the torchmdnet.models.model.load_model function to load the model for inference, which strips away the pytorch lightning overhead. There you can just pass it as a keyword argument to overwrite it. You can also for example enable/disable force predictions at inference time using derivative=True/False.

giadefa · 2022-10-11T09:25:03Z

Cutoff 5 and 5 interaction layers seem to be optimal, so see if that fits. Maybe Raimondas optimizations could help?

…

On Tue, May 31, 2022 at 9:20 PM Peter Eastman ***@***.***> wrote: I'm trying to train equivariant transformer models on a GPU with 12 GB of memory. I can train small to medium sized models, but if I make it too large (for example, 6 layers with embedding dimension 96), CUDA runs out of device memory. Is there anything I can do to reduce the memory requirements? I already tried reducing the batch size but it didn't help. — Reply to this email directly, view it on GitHub <#96>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOU2FGJ3FP3X5I4RVR3VMZRADANCNFSM5XOSVTKA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

peastman · 2022-10-11T15:50:31Z

Cutoff 5 doesn't work for anything except very small molecules. Above about 40 atoms, it's essential to have a longer cutoff or you get very large errors. I'm hoping that once we add explicit terms for Coulomb and dispersion, that will allow using a shorter cutoff for the neural network.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ways to reduce memory use #96

Ways to reduce memory use #96

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

PhilippThoelke commented May 31, 2022

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

PhilippThoelke commented May 31, 2022 •

edited

Loading

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

peastman commented May 31, 2022

PhilippThoelke commented May 31, 2022

peastman commented May 31, 2022

PhilippThoelke commented May 31, 2022

giadefa commented Oct 11, 2022 via email

peastman commented Oct 11, 2022

Ways to reduce memory use #96

Ways to reduce memory use #96

Comments

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

PhilippThoelke commented May 31, 2022

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

PhilippThoelke commented May 31, 2022 • edited Loading

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

peastman commented May 31, 2022

giadefa commented May 31, 2022 via email

peastman commented May 31, 2022

PhilippThoelke commented May 31, 2022

peastman commented May 31, 2022

PhilippThoelke commented May 31, 2022

giadefa commented Oct 11, 2022 via email

peastman commented Oct 11, 2022

PhilippThoelke commented May 31, 2022 •

edited

Loading