-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freezing During Shutdown: Exception Handling in finalize_visit_id Causes Controller to Halt #1068
Comments
Hey, the function gets called outside of the handler, during shutdown. See here OpenWPM/openwpm/storage/storage_controller.py Lines 253 to 261 in 553b1f3
I didn't expect storage providers to throw exceptions, but if this an issue you are encountering please submit a PR to catch errors that might occur. |
What is currently confusing me is that there should be no concurrent access to this list, because we make sure the server is fully shut down before we call However, from your description we enter the This points to a deeper logic bug, so while your fix in #1070 will prevent the error, it would also put the same visit id into the completion queue twice which would break other stuff. |
As I can't reproduce your issue, could you add something like self.logger.error(traceback.format_stack(file=sys.stdout, limit=2)) just before the |
in the logs we have 2 times Awaiting all tasks for visit_id 1780613490525049
additional logs which might be useful
|
and at the top of the logs i have
and after this exception i can not run any manager.execute_command_sequence |
but since the function returns None if visit_id not in self.store_record_tasks:
self.logger.error(
"visit_id %d, is already awaited, skipping...",
visit_id,
)
return None it doesn't go inside completion_tokens, which means it also doesn't go inside completion_queue on self.completion_queue.put((visit_id, False)) OpenWPM/openwpm/storage/storage_controller.py Lines 258 to 267 in 553b1f3
|
🚀 thank you so much for discovering this bug! t = await self.finalize_visit_id(visit_id, success=False)
- if t is not None:
- completion_tokens[visit_id] = t
+ completion_tokens[visit_id] = t Would be the first part of the fix, that puts items in the completion queue during shutdown, even when they don't have a completion token. (We can As for the original issue, I now think the correct patch is + store_record_tasks = self.store_record_tasks[visit_id]
+ del self.store_record_tasks[visit_id]
+ for task in store_record_tasks:
- for task in self.store_record_tasks[visit_id]:
await task
- del self.store_record_tasks[visit_id] This way when we enter the function the second time after yielding in |
Hey @MohammadMahdiJavid, I deployed the fix I described as PR #1073. Could you test on the latest master branch and confirm everything is fixed? |
Hi, Apologies for the delay in response, each experiment takes a few days to finish although i messed up by forgetting to put some visit ids into i'm gonna test the deployed changes and come back again if anything goes wrong just a quick note for enhancement |
Hi,
When the storage controller receives a shutdown signal, it waits for all tasks associated with each visit ID. However, during the shutdown process, if an exception is thrown while processing records in finalize_visit_id, the controller freezes and doesn't continue.
Why does it freeze? Are there any ways to overcome this issue? In the case of an exception, shouldn't it log the error and continue?
Exception (KeyError) is thrown at
OpenWPM/openwpm/storage/storage_controller.py
Line 219 in 553b1f3
which is being called from
OpenWPM/openwpm/storage/storage_controller.py
Line 258 in 553b1f3
and at the end causes an infinite loop at
OpenWPM/openwpm/task_manager.py
Lines 394 to 402 in 553b1f3
because it is waiting for the visit_id to appear in the completion queue.
in my logs i have already that the visit_id is awaited
storage_controller - INFO - Awaited all tasks for visit_id 5095731113063583 while finalizing
which indicates, it has already been finalized
OpenWPM/openwpm/storage/storage_controller.py
Lines 220 to 222 in 553b1f3
but 3 lines below i have the above exception which says that it didn't find the visit id which makes sense since it is finalized 3 lines up and deleted afterwards
The text was updated successfully, but these errors were encountered: