You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I understand it, the only place where the unit status can transition to AssignmentState.COMPLETED is in mephisto/data_model/unit.py in get_status(). This method needs to be called to keep the agent status and unit status in sync. However, when checking if a worker can work on another assignment, this is probably not called. Normally this is not an issue because the operator regularly (indirectly) calls this here:
asyncdef_track_and_kill_runs(self):
...
ifnottracked_run.force_shutdown:
iftracked_run.task_launcher.finished_generatorsisFalse:
# If the run can still generate assignments, it's# definitely not donecontinuetask_run=tracked_run.task_runtask_run.update_completion_progress(
task_launcher=tracked_run.task_launcher
)
ifnottask_run.get_is_completed(): # <- herecontinue
...
However, when the using a long-running generator, it can happen that a unit finishes before the generator does.
If you want to try this, on this branch I replaced the static_task_data with the following generator:
defslow_generator():
importtimei=0while(True):
i+=1time.sleep(3)
yield {"text": f"This is assignment {i}"}
One quick fix would be to simply move the emptiness check of the generator below get_is_completed see #1039. But a cleaner solution would probably be to either
always sync the states when accessing the assignment status (i.e. also when checking if workers can take on more units) or
directly set this when the agent status changes.
EDIT: this only happens for screening units!
The text was updated successfully, but these errors were encountered:
Thanks for all the details here @PReithofer - this is an interesting bug you've stumbled on, as the limitations for workers' active tasks requires information in the MephistoDB to be up-to-date, but indeed only a get_status() call actually pushes that status update.
Your quick fix in #1039 seems OK to me, though it definitely brings up questions on how we manage state syncing correctly. Indeed I've left this up to the _track_and_kill_runs thread, but it may be time to rethink, we haven't used the generator pattern extensively.
As I understand it, the only place where the unit status can transition to
AssignmentState.COMPLETED
is inmephisto/data_model/unit.py
inget_status()
. This method needs to be called to keep the agent status and unit status in sync. However, when checking if a worker can work on another assignment, this is probably not called. Normally this is not an issue because the operator regularly (indirectly) calls this here:However, when the using a long-running generator, it can happen that a unit finishes before the generator does.
If you want to try this, on this branch I replaced the
static_task_data
with the following generator:One quick fix would be to simply move the emptiness check of the generator below
get_is_completed
see #1039. But a cleaner solution would probably be to eitherEDIT: this only happens for screening units!
The text was updated successfully, but these errors were encountered: