Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch and optimize large deletions #971

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

squeaky-pl
Copy link
Contributor

@squeaky-pl squeaky-pl commented Nov 7, 2024

After #969 we got RAM and CPU usage under control.

The other problem with this code is that it can be painfully slow, especially with large deletions.

I know we have at least on escalation because of the slowness of this code where customer is using automation to move large amounts of emails between folders.

There are two problems here:

  • For non conforming IMAP servers that can list one message under many uids we were deleting uids one by one and then updating a message, for such cases it's thousand of times much faster to delete all the uids in one go and then update message.
  • We had N+1 queries when recalculating email categories

Comment on lines -235 to -241
# We do this one-uid-at-a-time because issuing many deletes within a
# single database transaction is problematic. But loading many
# objects into a session and then frequently calling commit() is also
# bad, because expiring objects and checking for revisions is O(number
# of objects in session), resulting in quadratic runtimes.
# Performance could perhaps be additionally improved by choosing a
# sane balance, e.g., operating on 10 or 100 uids or something at once.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says it all, it was here from the very start. We no longer load as many objects to session as we limited that in previous PR.

Comment on lines +309 to +311
def update_message_after_uid_deletion(
*, account_id: int, message_id: int, category_type: CategoryType
) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just extracted to separate function, but git diff is useless here.

@@ -37,7 +37,7 @@ def test_messages_deleted_asynchronously(
db.session,
)
assert "label" in [cat.display_name for cat in message.categories]
remove_deleted_uids(default_account.id, folder.id, [msg_uid])
remove_deleted_uids(default_account.id, folder.id, {msg_uid})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are sets in practice everywhere in real code

Comment on lines +212 to +224
def get_categories(self, category_type: CategoryType) -> set[Category]:
"""
Returns the categories this uid belongs to.

Args:
category_type: The category type for the account this uid belongs to.
"""
if category_type == "label":
return {label.category for label in self.labels} | {self.folder.category}
elif category_type == "folder":
return {self.folder.category}
else:
assert_never(category_type)
Copy link
Contributor Author

@squeaky-pl squeaky-pl Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we were unconditionally try to load labels, even if the account does not support labels (only Gmail supports them), this was just stupid and I parametrized this on category_type to avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally this would be a perfect place to use match and case but I need to update our lint tooling to support that syntax and there were too many things to change to make it work on this PR.

@squeaky-pl
Copy link
Contributor Author

These paths are already well tested.

@squeaky-pl squeaky-pl merged commit 229e428 into master Nov 7, 2024
3 checks passed
squeaky-pl added a commit that referenced this pull request Nov 12, 2024
squeaky-pl added a commit that referenced this pull request Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants