-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk_update of jobs fails intermittently #383
Comments
I think this issue stems from the timer middleware, as this part of the traceback is in our code stack:
This middleware was just there to log request times, which was handy for optimizing queries, but can be removed with no detriment otherwise. I believe this is a known issue that's been solved with more recent versions of fastapi (and its underlying web server starlette):
I would suggest to try a couple of things:
To do a container software update, I would update the versions in
|
This has been encountered by a few users. There are times when bulk updates of job statuses fail. For example, advancing jobs from STAGED_IN to PREPROCESSED or RUN_DONE to POSTPROCESSED. I've not been able to figure out what triggers this issue, but once a site encounters this issue, it remains persistent for all jobs in the site. Restarting the site and/or restarting the server does not help. The only solution I've been able to find is to update jobs individually, but this is tedious and the site typically will continue to have the issue for new jobs.
On the client side, logs contain this error:
It will continue retrying but never succeed. The only way to resolve it is to change the job states one at a time with
job.save()
. Extending the number of retries does not help.On the server side, logs contain this error:
I'm not sure what the server side error indicates, but that's all that's apparent in
server-balsam.log
. This same error over and over. Perhaps @masalim2 has some ideas?The text was updated successfully, but these errors were encountered: