Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-exec: add total time waited for a job in drain message for unkillable processes #6376

Open
grondo opened this issue Oct 16, 2024 · 0 comments

Comments

@grondo
Copy link
Contributor

grondo commented Oct 16, 2024

Problem: The job-exec module drains nodes with what it considered "unkillable" processes after max-kill-count attempts have been made to terminate the job shell. However, it is difficult for admins to determine how long that actually took, because the module uses an exponential backoff up to a max of 300s when retrying to kill the job shell.

Consider logging the total time waited until draining nodes for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant