Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss UpdateMany time complexity #316

Open
ChinoCribioli opened this issue Sep 2, 2024 · 0 comments
Open

Discuss UpdateMany time complexity #316

ChinoCribioli opened this issue Sep 2, 2024 · 0 comments
Assignees
Labels
documentation 📖 Improvements or additions to documentation good first issue Good for newcomers

Comments

@ChinoCribioli
Copy link
Contributor

Regarding the implementation of the updateMany method as a result of issue #117, I wrote down an analysis of the worst-case time complexity of the method. I attach it to this description to open a discussion about it.

Using updateMany to update m leaves is more efficient than using the update method m times because it prevents updating middle nodes several times, which would happen when updating leaves with common ancestors. The worst-case time complexity of the naive approach of calling 'update' m times is O(m*log(n)) where n is the number of leaves of the tree, while this algorithm offers a complexity of O(m*(log(n)-log(m)) + m). This is a slight improvement, mostly when m ~ n. This makes sense because if we update a lot of leaves, most of them will have common ancestors.

The proof for this would be as follows: If we have a list of m updates in a tree of depth d ~ log(n), let k be the smallest integer such that m <= 2^k. In the worst case, the paths of the m updates will pass through m different nodes of the level k of the tree. In other words, the m leaves to update have all different ancestors up to the level k (which has 2^k nodes). In this scenario, the number of nodes of level k or higher that need to be updated is exactly m*(d-k), and the number of nodes of level k-1 or lower to update is at least m/2 (because there are at least m/2 different ancestors of a set of m nodes) and at most 2*m (because there are not even 2^k of nodes among all those levels, and 2^k < 2m). Therefore, the number of nodes to update is at least m*(d-k) + m/2 and at most m*(d-k) + 2m. This gives us that the number of nodes to update, which is the most expensive operation of the method, is O(m*(log(n)-log(m)) + m) since k ~ log(m). If we consider the worst case of m, which is m = n, the complexity is O(n). This is mainly because each node will be modified at most once, and the total number of nodes of the tree is <= 2*n.

@cedoor cedoor added documentation 📖 Improvements or additions to documentation good first issue Good for newcomers labels Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation 📖 Improvements or additions to documentation good first issue Good for newcomers
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

2 participants