-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to postpone filter init till it's running #242
Comments
I'm trying:
and then inside the filter only the first items gets hit by this init. edit: hmm, this approach seems to hang. so I got to:
and then nothing happens. I set the reader to limit=10, so it should be real fast. Must be something pickle-related Do you by chance have an example of a working filter that uses a gpu given by the srun task? |
update: If I run the same job as a local executor it works fine, it hangs on the first sample w/ slurm, so it must be some pickle related issue. When I scancel the job it shows the buffered up part
|
can you share the full class so I can try to reproduce the issue? |
Yes, of course
or perhaps if you have an example of a filter that works on running something on cuda that could help too. |
Is there a plan for another way of passing the jobs instead of pickle? The hanging happens because of I came up with the following workaround, creating my own post-un-pickle-init via
|
So it appears that currently I can't instantiate a model on a gpu because the filter object is created by the launcher, which either doesn't have a gpu, or it is most likely the wrong gpu even if it has one, since we would need a dedicated gpu(s) for each task.
Is it possible to add a 2nd init which would be the user init that will run on the actual job?
The filter task is simple - instantiate a model on a gpu and then run filter using it - of course we don't want model to be re-instantiated on every filter call.
Needing to
import torch
inside thefilter
is super-weird as well, but I get that it's due to pickle - but perhaps we can have two inits - one of the framework - and then another of the user.So when a job is launched the first thing the framework runs is user defined
init
if any, and then proceeds normally.I guess I will try to overcome this meanwhile using
@functools.cache
or something similar.Thank you!
tag: @guipenedo
The text was updated successfully, but these errors were encountered: