-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch and optimize large deletions #971
Conversation
# We do this one-uid-at-a-time because issuing many deletes within a | ||
# single database transaction is problematic. But loading many | ||
# objects into a session and then frequently calling commit() is also | ||
# bad, because expiring objects and checking for revisions is O(number | ||
# of objects in session), resulting in quadratic runtimes. | ||
# Performance could perhaps be additionally improved by choosing a | ||
# sane balance, e.g., operating on 10 or 100 uids or something at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says it all, it was here from the very start. We no longer load as many objects to session as we limited that in previous PR.
def update_message_after_uid_deletion( | ||
*, account_id: int, message_id: int, category_type: CategoryType | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just extracted to separate function, but git diff is useless here.
@@ -37,7 +37,7 @@ def test_messages_deleted_asynchronously( | |||
db.session, | |||
) | |||
assert "label" in [cat.display_name for cat in message.categories] | |||
remove_deleted_uids(default_account.id, folder.id, [msg_uid]) | |||
remove_deleted_uids(default_account.id, folder.id, {msg_uid}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are sets in practice everywhere in real code
def get_categories(self, category_type: CategoryType) -> set[Category]: | ||
""" | ||
Returns the categories this uid belongs to. | ||
|
||
Args: | ||
category_type: The category type for the account this uid belongs to. | ||
""" | ||
if category_type == "label": | ||
return {label.category for label in self.labels} | {self.folder.category} | ||
elif category_type == "folder": | ||
return {self.folder.category} | ||
else: | ||
assert_never(category_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we were unconditionally try to load labels, even if the account does not support labels (only Gmail supports them), this was just stupid and I parametrized this on category_type to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally this would be a perfect place to use match and case but I need to update our lint tooling to support that syntax and there were too many things to change to make it work on this PR.
These paths are already well tested. |
This reverts commit 229e428.
After #969 we got RAM and CPU usage under control.
The other problem with this code is that it can be painfully slow, especially with large deletions.
I know we have at least on escalation because of the slowness of this code where customer is using automation to move large amounts of emails between folders.
There are two problems here: