Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

other than notarization poller, scriptworkers don't respect max run time #720

Open
bhearsum opened this issue May 11, 2023 · 6 comments
Open

Comments

@bhearsum
Copy link
Contributor

We've found a few examples of tasks going well over their max run time recently. For example, this task had a max run time of 600 seconds (10 minutes). The first attempt ran for 21 minutes, after which it was killed with signal -15. The second run ran for 16 minutes and completed successfully.

It's possible the first one was killed for going over the max run time -- but even if that's the case, that's far too late.

@jcristau
Copy link
Contributor

I think the SIGTERM after 20-ish minutes is more likely to be from going over the termination grace period in k8s, FWIW.

@bhearsum
Copy link
Contributor Author

It looks like maxRunTime is defined in specific script schemas, eg: https://github.com/mozilla-releng/scriptworker-scripts/blob/master/beetmoverscript/src/beetmoverscript/data/beetmover_task_schema.json#L25

However, I don't see anything in them nor in scriptworker itself that even reads it, let alone tries to respect it...

@jcristau
Copy link
Contributor

Damn... I remember adding support for the notarization_poller in bug 1716793... I guess this didn't come up in other scriptworkers because they tend to be fairly quick anyway and eventually get killed by k8s so nobody ever noticed?
Good find in any case.

@bhearsum
Copy link
Contributor Author

In fact, no other schemas even mention it! Notarization poller does seem to support it though, even though it's not in its schema: https://github.com/mozilla-releng/scriptworker-scripts/blob/master/notarization_poller/src/notarization_poller/task.py#L65

I'm going to move this issue to scriptworker-scripts, since that seems to be where this sort of thing is implemented.

@bhearsum bhearsum transferred this issue from mozilla-releng/scriptworker May 11, 2023
@bhearsum bhearsum changed the title scriptworker is not (always?) respecting max run time other than notarization poller, scriptworkers don't respect max run time May 11, 2023
@jcristau
Copy link
Contributor

scriptworker has a global task_max_timeout setting, and will kill the task if it exceeds that duration (currently 1h).

One thing we could do is remove all mentions of maxRunTime and just set task_max_timeout appropriately per pool instead of it being always the same?
If we do want more granularity, or the ability to ride the trains, then we might be able to change scriptworker's run_task to have something like timeout = min(context.config["task_max_timeout"], context.task.get("payload", {}).get("maxRunTime", context.config["task_max_timeout"]))?

@bhearsum
Copy link
Contributor Author

The main case I can think of where we'd want to be able to specify max run time per task is in beetmover (where most beetmover jobs are usually very quick, but push to releases and maybe some maven stuff can be much lengthier). I think most other scriptworker types probably have fairly consistent expected run times across all tasks though.

With that said, I'd also be okay just removing maxRunTime from the payloads if it's easier...I don't think the one case above is important enough to stress over too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants