-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ways to reduce memory use #96
Comments
and of course batch size, what is your current batch size for training?
On Tue, May 31, 2022 at 9:44 PM Gianni De Fabritiis ***@***.***>
wrote:
… Cutoff 5 and 5 interaction layers seem to be optimal, so see if that fits.
Maybe Raimondas optimizations could help?
On Tue, May 31, 2022 at 9:20 PM Peter Eastman ***@***.***>
wrote:
> I'm trying to train equivariant transformer models on a GPU with 12 GB of
> memory. I can train small to medium sized models, but if I make it too
> large (for example, 6 layers with embedding dimension 96), CUDA runs out of
> device memory. Is there anything I can do to reduce the memory
> requirements? I already tried reducing the batch size but it didn't help.
>
> —
> Reply to this email directly, view it on GitHub
> <#96>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AB3KUOU2FGJ3FP3X5I4RVR3VMZRADANCNFSM5XOSVTKA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
16 bit floats? |
I'm using cutoff 10. I've found that 5 is far too short. It can't reproduce energies for molecules larger than about 40 atoms, and it has no chance at all on intermolecular interactions. Batch size seems to have very little effect on memory use. With 5 layers I can use batch size 100. Add a sixth layer and it runs out of memory even if I reduce it to 1. |
with a cutoff of 10A what are you using for max_num_neighbors?
…On Tue, May 31, 2022 at 10:04 PM Peter Eastman ***@***.***> wrote:
I'm using cutoff 10. I've found that 5 is far too short. It can't
reproduce energies for molecules larger than about 40 atoms, and it has no
chance at all on intermolecular interactions.
Batch size seems to have very little effect on memory use. With 5 layers I
can use batch size 100. Add a sixth layer and it runs out of memory even if
I reduce it to 1.
—
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOWJXJWV4PIPWABM5GLVMZWERANCNFSM5XOSVTKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That seems odd. Are you sure you are changing |
80
I was changing only |
Are you using the latest code that errors out if it gets above 80? With a
10A cutoff it seems possible.
…On Tue, May 31, 2022 at 10:41 PM Peter Eastman ***@***.***> wrote:
with a cutoff of 10A what are you using for max_num_neighbors?
80
Are you sure you are changing batch_size and inference_batch_size?
I was changing only batch_size, not inference_batch_size. If I reduce
both of them to 32 then I can get it to run. Thanks!
—
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOQSPXV4QWDG3T62QMDVMZ2PXANCNFSM5XOSVTKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
There's no problem with higher values than 80 (except of course running out of memory). 100 also works. |
With higher of course, but there can be more than 80 atoms in 10A.
…On Tue, May 31, 2022 at 10:58 PM Peter Eastman ***@***.***> wrote:
There's no problem with higher values than 80 (except of course running
out of memory). 100 also works.
—
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOURFPQ5FBDVCNGSFE3VMZ4QFANCNFSM5XOSVTKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It depends on the particular samples. Does the value of |
It applies always, not only during training. The argument determines the maximum number of neighbors that are collected in the neighbor list algorithm. You can overwrite it when you load a model checkpoint though, to set it to a higher number for inference for example. |
That's good to know. If I want to override it, would I just add the argument |
I think that should work but better make sure it actually overwrites it. I'd recommend using the |
Cutoff 5 and 5 interaction layers seem to be optimal, so see if that fits.
Maybe Raimondas optimizations could help?
…On Tue, May 31, 2022 at 9:20 PM Peter Eastman ***@***.***> wrote:
I'm trying to train equivariant transformer models on a GPU with 12 GB of
memory. I can train small to medium sized models, but if I make it too
large (for example, 6 layers with embedding dimension 96), CUDA runs out of
device memory. Is there anything I can do to reduce the memory
requirements? I already tried reducing the batch size but it didn't help.
—
Reply to this email directly, view it on GitHub
<#96>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOU2FGJ3FP3X5I4RVR3VMZRADANCNFSM5XOSVTKA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Cutoff 5 doesn't work for anything except very small molecules. Above about 40 atoms, it's essential to have a longer cutoff or you get very large errors. I'm hoping that once we add explicit terms for Coulomb and dispersion, that will allow using a shorter cutoff for the neural network. |
I'm trying to train equivariant transformer models on a GPU with 12 GB of memory. I can train small to medium sized models, but if I make it too large (for example, 6 layers with embedding dimension 96), CUDA runs out of device memory. Is there anything I can do to reduce the memory requirements? I already tried reducing the batch size but it didn't help.
The text was updated successfully, but these errors were encountered: