-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracking issue: standby/preemptible jobs #5739
Comments
I think that a submission flag would work as long as the drawbacks that you noted could be overcome. Generally we allow 'standby' jobs to be exempt from other queue limits and allow all users to access them. So, we would also want the preemptible flag could also be seen by the priority plugin so that it can not count those jobs against queue limits. I think that would provide the same benefits as the queue implementation, at least for how we use standby / preemption. That said, there are a number of use cases that can be solved by overlapping queues (exempt / expedite, whole cluster DATs), so that could be considered a benefit of that approach. Exempt / expedite could probably all be done through accounting / the priority plugin. We should probably talk more about DATs where we want to be able to let a user run on all nodes on a cluster that we've split into multiple queues. |
This idea was discussed again in a meeting recently. The preemptible flag still seems to be the solution of choice, but this will require an update to the resource acquisition protocol. I've opened flux-framework/rfc#423. |
Over in the flux team on Teams, one of the users on Tuolumne had an interesting idea around standby / preemption, which would be to allow users to specify a minimum duration for their jobs:
|
From @ryanday36's list in #5165:
In some offline discussion, it was proposed that we could add a
preemptible
(or similar) job submission flag for this purpose. Drawbacks to this approach:flux jobs
outputMost of those can be easily overcome if a submission flag is the correct approach.
Alternate solutions include:
The text was updated successfully, but these errors were encountered: